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CHAPTER I 


Introduction 

1. 1 Outline of the Prcolem 

For many applications, especially those in which a computer 
is controlling a real-time process (e. g. , telephone switching, 
flight control of an aircraft or spacecraft, control of traffic in a 
transportation system, etc. ), reliability is a major factor in the 
design of the system. The need for high reliability arises because 
of the serious consequences errors may have in terms of danger to 
human lives, loss of costly equipment, or- disruption of business or 
manufacturing operations. For example, it is economically unsound 
to shut down a steel mill for even a short time in order to repair 
a comparatively inexpensive controlling computer. The seriousness 
of the consequences, of course^ depends upon the application and must 
be weighed against the cost of improving the reliability. 

A number of techniques exist for improving computer reliability. 
One of the more obvious is the use of more reliable components. 
While the use of reliable components is clearly very important, it 
has been recognized that this technique alone is not sufficient to meet 
the requirements for modern ultrareliable computing systems [34]. 
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Another general technique which is useful in some applications 
Is the use of masking redundancy such as Triple Modular Redundancy 
or Quadded Logic [35] . One major drawback to masking redundancy 
is that if failed components are not replaced and the mission time 
is long, then the reliability of a system which uses masking redun- 
dancy can actually be less than that of the corresponding simplex 
system [25], 

A third means of increasing system reliability and availability 
is through fault diagnosis and subsequent system reconfiguration or 
repair. For example, a computer designed to control telephone 
switching, the No. 1 Electronic Switching System (ESS) contains 
duplicates of each module and fault diagnosis is achieved primarily 
by dynamically comparing the outputs of both modules [11]. Once 
the faulty module is identified, it is repaired manually with diagnos- 
tic help from the fault -free computer. Another ultra -re liable 
computer, the Jet Propulsion Laboratory Self -Testing and Repair- 
ing (STAR) computer, also makes use of modularity and standby 
sparing [ 4 J . 

One means of performing fault diagnosis is to continuously 
monitor the performance of the system, as it is being used, to deter- 
mine whether its actual behavior is tolerably close to the intended 
behavior. It is this sort of monitoring which we mean by the term 
’’on-line diagnosis. ” Others have used the term ’’error detection” 
to refer to this sort of monitoring ([22] , [23 ]). 



3 


Implementation of on-line diagnosis may be external to the 
system, both internal and external, or completely internal. In the 
last extreme, on-line diagnosis is sometimes referred to as "self- 
diagnosis” or "self -checking” ([ 8 ], [ 9 J). 

There are two essential requirements for on-line diagnosis. 

The first is redundancy; more than the minimum amount of informar 
tion must be processed. The second is verifiability; the redundant 
information must be checked for consistency. 

The rlgnals generated by a monitoring device can be used in 
many ways. For example, the IBM System/360 utilizes checking 
circuits to detect errors [ 6 ] . The signals generated by these 
circuits are used in some models to freeze the computer so that the 
instruction which was currently executing may be retried if possible, 
and to assist in the checkout and repair of the computer if the auto- 
matic retry attempt fails. Ultra-reliable computers typically use 
the signals generated by the monitoring device to provide the computer 
system with the information it needs to automatically reconfigure 
itself so as to avoid using any fault circuits. One other use for such 
signals is to simply inform the system user that the system is not 
operating properly and that there may be errors in his data. 

In general, on-line diagnosis is used to verify that the system 
is operating properly; or conversely, to signal that it is in need of 
repair. In most computer systems this task is also performed in 
some part by "off-line diagnosis. ” By off-line diagnosis we are 
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referring to the process of removing the system from its normal 
operation and applying a series of prearranged tests to determine 
whether any faults are present in the system. There are major 
differences between on-line and off-line diagnosis and it is important 
to be aware of the capabilities and the limitations of each. 

One basic difference is that on-line diagnosis is a continuous 
process whereas off-line diagnosis has a periodic nature. Transient 
faults are difficult to diagnose with off-line diagnosis because if a 
fault is transient in nature it may not be in the system when it is 
tested. On the other hand, since on-line diagnosis is a continuous 
monitoring process both permanent and transient faults can be 
diagnosed. It has been recognized by Ball and Hardie [ 5 ] and 
others that inter mittents do occur frequently, and that finding an 
orderly means to diagnose them is an important unsolved problem. 

Thus the inability of off-line diagnosis to deal satisfactorily with tran- 
sients is a severe limitation. 

Another basic difference is that the delay between the occurrence 
of a fault and its subsequent detection is generally greater for off- 
line than on-line diagnosis. Recovery after a fault has been diagnosed 
may sometimes be achieved by reconfiguration and restarting. 

However, in a real-time application irrepeatable or nonreversable 
events may take place if an error occurs and is net immediately 
detected. In any application, if there is a delay between the occurrence 
of an error and the subsequent diagnosis of a fault, then 
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contamination of data bases may occur thus making restarting 
difficult. For these reasons, the inherent delay associated with 
off-line diagnosis can be a serious limitation. 

One further difference between on-line and off-line diagnosis 
is that with off-line diagnosis the system must be removed from its 
normal operation to apply the tests. This also may not be acceptable 
in a real-time application. 

The cost of either form of diagnosis depends on the nature of 
the system to be diagnosed, the technology to be used in building 
the system, and the degree of protection against faulty operation 
that is required. With on-line diagnosis the cost is almost totally 
in the design, construction, and maintenance of extra hardware. 

With off-line diagnosis the cost is the initial generation of the tests 
and in the subsequent storage and running of these tests. 

In general, off-line diagnosis is useful for factory testing and 
for applications where immediate knowledge of any faulty behavior 
is not essential. Off-line diagnosis is also useful for locating the 
source of trouble once such trouble is indicated by on-line diagnosis. 
For example, as stated earlier Bell System’s No. 1 ESS uses dupli- 
cation and comparison as its primary error detection scheme. But 
once an error has been detected, off-line diagnosis is used to deter- 
mine which processer exhibited the erroneous behavior and to locate 
the faulty module in that processer. 
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In the Design Techniques for Modular Architecture for Reliable 
Computing Systems (MARCS) study a more integrated use of on- 
line diagnosis is proposed whereby a number of checking circuits 
observe the performance of various parts of the computer [ 8 ] . 

With a scheme such as this, information about the location of a 
fault can be obtained from knowledge of which checking circuit 
indicated the trouble. 

Both on-line and off-line diagnosis have been used to check 
the operation of computers from the very first machines until the 
present time. In a short paper published in 1957, Eckert [.12] informs 
us that off-line diagnosis was relied upon for the ENIAC computer, 
that the BINAC system had duplicate processors, and that the UNIVAC 
used a more economical on-line diagnosis scheme involving 35 check- 
ing circuits. During the past decade, however, the development of 
theory and techniques for fault diagnosis in digital systems and 
circuits have focused mainly on problems of off-line diagnosis (see 
[9 ] and [14 ] for example). 

An alternative means of performing diagnosis has been investi- 
gated by White [36]. His novel scheme is similar toon-line diagnosis 
in that it involves redundant processing of information and subsequent 
checking for consistency. However, with his scheme the redundancy 
is in time rather than in space. After every operation is performed, 
a related operation is initiated which uses the same circuitry but 
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with different signals. The results of these two operations are then 
checked for consistency. 

One other approach to diagnosis is simply to have human users 
or observers of the system watch for obvious misbehavior. Since 
faults often give rise to behaviors which are clearly erroneous, many 
faults can be detected in this manner. The effectiveness of this 
method is highly dependent upon the individual system and program, 
and is exceedingly difficult to evaluate. It seems reasonable to 
assume, however, that this method is less effective than any of the 
methods previously discussed. Certainly, this method is unacceptable 
for many applications. 
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1, 2 Brief Survey of the Literature 

The work that has been done on on-line diagnosis is mainly in 
the area of techniques. One early paper is Kautz's study [21] of 
fault detection techniques for combinational circuits. In this paper 
he investigated a number of techniques including the use of codes 
and the possibility of greater economy if immediate detection of 
errors was not necessary. Many of the more common on-line 
diagnosis techniques have been gathered together and published in 
a book by Sellers, Hsiao, and Bearnson [33]. Much of what is in 
this book and a large portion of the techniques that can be found 
elsewhere in the literature are concerned with special circuits 
such as adders and counters. For example, see the work of Avizienis 
[ 3 ] , Rao [32] , Dorr [10] , and Wadia [37]. 

Relatively little work can be found on the theory of on-line 
diagnosis. As with the work on on-line diagnosis techniques, much 
of the theory of on-line diagnosis focuses on arithmetic units. 

In one of the earliest works of a theoretical nature, Peterson 
[29] showed that an adder can be checked using a completely indep- 
endent circuit which adds the residue, module some base, of the 
operands. He went on to show that any independent check of this 
type was a residue class check. Further theoretical work concern- 
ing the diagnosis of arithmetic units using residue codes can be 
found in Massey [24] and Peterson [3 1 ] . 
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An early theoretical result of a more general nature was published 
by Peterson and Rabin [30]. They showed that combinational circuits 
can differ greatly in their inherent diagnosability and that in some 
cases virtual duplication is necessary. 

A later and very important paper is that of Carter and Schneider 
[ 7 ] . They propose a model for on-line diagnosis which involves a 
system and external checker. The input and ouput alphabets of 
the system are encoded and the checker detects faults by indicating 
the appearance of a non -code output. A system is self -checking 
if for every fault in some prescribed set, (i) the system produces 
a non-code output for at least one code space input, and (ii) the 
system never produces incorrect code space outputs for code space 
inputs. Thus, (i) insures that every fault can be detected during normal 
usage , and (ii) insures that if no fault has been detected then tne output 
canbe reliedupontobe correct. The checkers that they consider are 
also self -checking. Using this modelthey prove that any system canbe 
designedtobe self -checkingfor the class of single stuck-at faults. 

Anderson [ 1 ] has named property (i) "self -testing" and property 
(ii) "fault -secure , " and he has investigated these properties for 
combinational networks. In Chapter in it is shown that the notion 
of diagnosis considered in this study is a generalization of the fault - 
secure property. 
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!♦ 3 Synopsis of the Report 

This report describes an investigation of theory and techniques 
applicable to the on-line diagnosis of sequential systems. 

With decreasing cost of logic and the increasing use of computers 
in real-time applications where erroneous operation can result in 
the loss of human life and/or large sums of money the use of on-line 
diagnosis can be expected to increase greatly in the near future. 

The importance of this area along with the relative lack of theoreti- 
cal results is or* motivation for initiating this study of on-line 
diagnosis. 

The purpose of this investigation is to further the currently 
insufficient store of information on the subject of on-line diagnosis. 
The formal approach taken in this report leads to a fuller under- 
standing of current on-line diagnosis practices and suggests 
generalizations of known techniques. It also provides a framework 
for evaluating the advantages and limitations of the various on-line 
diagnosis schemes. 

In Chapterir, a complete model for the study of on-line diagnosis 
is developed. First an appropriate class of system models is 
formulated which can serve as a basis for a theoretical study of 
on-line diagnosis. Then notions of realization, fault, fault -tolerance 
and diagnosability are formalized which have meaningful interpreta- 
tions in the context of on-line diagnosis. The following chapters are 
all concerned with the properties of the notion of diagnosis which is 
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introduced in this chapter. 

Chapter III contains some elementary properties of diagnosis 
which are independent of the particular class of faults under considera- 
tion. The results of this chapter help to give a basic understanding 
of on-line diagnosis and are used in the later chapters. 

Chapter IV is concerned with the diagnosis of the set of unre- 
stricted faults. This set of faults is simply the set of all faults of 
the system under consideration. The major result of this chapter 
gives a lower bound on the amount of redundancy that must be employed 
by any technique which can be used for unrestricted fault diagnosis. 

In Chapter V, the use of inverse systems for the diagnosis 
of unrestricted faults is considered. Inverse systems are formally 
introduced, and a partial characterization of those inverse systems 
which can be used for unrestricted fault diagnosis is obtained. Since 
not every system has an inverse system, let alone one which is 
suitable for unrestricted fault diagnosis, it is not always possible 
to apply this technique directly. However, it is shown that every 
system has a realization upon which this technique can be success- 
fully applied. 

In Chapter VI, the diagnosis of systems which are structurally 
decomposed and are represented as a network of smaller systems 
is studied. The fault set considered here is the set of faults which 
only affect one component system in the network. A characterization 
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of those networks which can be diagnosed using a purely combina- 
tional detector is achieved. A technique is given which can be 
used to realize any network by a network which is diagnosable in 
the above sense. Limits are found on the amount of redundancy 
involved in any such technique. 



CHAPTER II 


A Model for tlie Study of On-Line Diagnosis 

In this chapter we develop the model which we will be using in 
this theoretical study of on-line diagnosis. 

We begin by introducing a new class of system models, called 
’’resettable discrete -time systems," which will serve as the basis of 
our study. Within this model we will consider a fault of a system S to 
be a transformation of S into another system S’ at some time t. The 
resulting faulty system is taken to be the system which looks like S up 
to time t and like S’ thereafter. 

Next the companion notions of fault tolerance and error are 
defined in terms of the resulting system being able to mimic some de- 
sired behavior. 

Finally, our notion of on-line diagnosed is introduced. This 
notion involves an external detector and a maximum time delay within 
which every error caused by a fault in some prescribed set must be 
detected. 
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2.1 Resettable Discrete -Time Systems 

On-line diagnosis is inherently a more complex process than off- 
line diagnosis because of two complicating factors: i) it has to deal with 
input over which it has no control and ii) faults can occur as the system 
is being diagnosed. We would like to build a theory of on-line diagnosis 
using conventional models of time -invariant (stationary, fixed) systems 
(e. g. , sequential machines, sequential networks, etc. ). However, 
due to the second factor mentioned above these conventional models 
can no longer be used to represent the dynamics of the system as it is 
being diagnosed. A system which is designed and built to behave in a 
time -invariant manner becomes a time -varying system as faults occur 
while it is in use. Therefore, a more general representation based 
on time -varying systems is required. Based on this fundamental obser - 
vation we have developed what we believe to be an appropriate model 
for the study of on-line diagnosis. 

Definition 2. 1 : Relative to the time -base T ={..., -1, 0, 1. . .}, a 
discrete -time system (with finite input and output alphabets) is a system 

S = (I,Q,Z,6,X) 

where I is a finite nonempty set, the input alphabet 
Q is a nonempty set, the state set 
Z is a finite nonempty set, the output alphabet 


6 : Q x I x T Q, the transition function 
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X: Q x I x T Z, the output function. 

The interpretation of a discrete -time system is a system which, 
if at time t is in state q and receives input a, will at time t emit out- 
put symbol A(q, a, t) and at time t + 1 be in state <5(q, a, t). In the special 
case where the functions 6 and X are independent of time (i. e. , are 
time -invariant), the definition reduces to that of a (Mealy) sequential 
machine. In the discussion that follows we will assume, unless other- 
wise qualified, that S is a finite-state (i. e. , |Q| < oo). 

To describe the behavior of a system, we first extend the transi- 
tion and output functions to input sequences in the following natural way. 
If I* is the set of all finite -length.sequences over I (including the null 
sequence A) then: 


6: Q x I*x T — > Q 
where, for all q e Q, a e I, t e T: 

6(q,A,t) = q 
3"(q, a, t) = 6(q, a, t) 

<!T(q, a^. . . a n >t) = d^q^ag. ..a j,t), a , t + n - 1) . 

Similarly, if I + = I-{a}: 

X: Q x I + x T -> Z 
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where for all q e Q, a e I, t c T: 

A(q,a,t) = A(q,a,t) 

A(q, ^^2"' ” x <^' * 1 * 2 ' • • 0» t + n — 1) « 

Henceforth S’ and A will be denoted simply as 6 and A. 

Relative to these extended functions, the behavior of S in state q 
is the function 

/3 : I + x T Z 

H 

where 

/3 q (x,t) = A(q,x, t) . 

Thus, if the state of the system is q and it receives input sequence x 

starting at time t, then & (x,t) is the output emitted when the last 

Q 

symbol in x is received (i. e. , the output at time t + |x| - 1 (|x j = 
length (x))). 

Many investigations of on-line diagnosis and fault tolerance have 
studied redundancy schemes such as duplication and triplication. 
Typically they have not dealt with the problem of starting each copy of 
a machine in the same state. In this study we will be examining these 
schemes and others for which the same problem arises. Since many 
existing systems have reset capabilities, and since this feature solves 
the above synchronizing problem we will use a special type of system 
for which the reset capabilities are explicitly specified. This explicit 
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specification of the reset capability is essential since it is an important 
part of the total system and it may be subject to failure. 

Definition 2. 2 : A resettable discrete -time system ( resettable system ) 

is a system 

S = (If Q> Z, 6, A, R,p) 

where (I, Q, Z, A) is a discrete-time system 

R is a finite nonempty set, tl e reset alphabet 
p : R x T -> Q, the reset function . 

A resettable system is resettable in the sense that if reset r is 
applied at time t - 1 then p(r, t) is the state at time t. This method of 
specifying reset capability is a matter of convenience. This feature 
could just as well have been incorporated as a restriction on the transi- 
tion function relative to a distinguished subset of input symbols called 
the reset alphabet. Thus a resettable discrete -time system can indeed 
be regarded as a special type of discrete -time system. If 6, A, andp 
are all independent of time the definition reduces to that of a resettable 
sequential machine. Thus a resettable machine can be viewed as a 
resettable system which is invariant under time -translations. 

Given a resettable system we can view it as a system organized 


as in Fig. 2. 1. 
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r € R 
a c I 



Fig. 2. 1. Schematic Diagram for S = (I,Q, Z, 6, A,R,p ) 


In many discussions we will not be directly concerned with the 
output function of a system, but rather we will want to focus our 
attention upon the state transitions. This motivates the following 
definition. 

Definition 2. 3 : A resettable discrete -time system S = (I, Q, Z, 6, A, R,p) 

is a resettable state system if Z = Q and A(q. a, t) = q for all q € Q, 
a £ I, and t e T. 

Since the output alphabet and output function of a resettable state 
system need not be explicitly specified, a resettable state system 
S = (I,Q, Z, 6, A, R,p) will be denoted by the 5-tuple (I, Q, 5,R,p). 

This formulation of resettable state systems as special types of 
resettable systems allows us to directly apply the following theory of 
on-line diagnosis to state machines. 
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Notation : Resettable systems will be denoted by S, S', Sj, Sg, etc. , 

and resettable machines will be denoted by M, M\ M^, Mg, etc. 
Unless otherwise specified, M will denote the resettable machine 
(I,Q, Z, 6, X, R,p); M f will denote the resettable machine (I\Q\ Z\ 6', 
X\R \p'); and so forth. «?(I, Z,R) will denote the set of systems with 
input alphabet I, output alphabet Z, and reset alphabet R. That is, 

«?(I,Z,R) = {S’lS* = (I,Q\Z,5%X*,R,p*)} . 

*511(1, Z, R) will denote the corresponding set of resettable machines. 

Definition 2. 4 : A resettable sequential machine M = (I, Q, Z, 6, X, R,p) 

is memoryless or combinational if | Q j = 1. 

The triple (1, Z, X) where X: I — > Z will be used to denote any 
memoryless machine with input alphabet I, output alphabet Z, and 
output function X. The memoryless machine M = (I, Z, X) is said to 
realize the function X: I -> Z. 

We will represent sequential machines in the usual manner, 
i. e. , via transition tables or state graphs. Resettable machines are 
represented by minor extensions of these two methods. The transition 
table of a resettable machine is identical to that of a machine with 
addition of one column on the right to accommodate the reset function. 
If p(r) = q then r will appear in this additional column in the row 
corresponding to state q. Similarly, the state graph of a resettable 
machine is identical to that of a machine with the addition of one short 
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arrow for each reR. This arrow will be labeled r and will point 
to state p(r). 

Example 2. 1 : Let be the sequence generator with reset alphabet 

{0} and input alphabet {1} which has been implemented by the circuit 
in Fig. 2. 2 



The transition table and the state graph for are shown in 
Figs. 2. 3 and 2. 4. 



1 

B 1 

00 

01/0 

0 

01 

11/1 


10 

00/1 


11 

10/1 



Fig. 2.3. Transition Table for 
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Fig. 2. 4. State Graph for Mj 

The circuit in Fig. 2. 2 is also an implementation of a similar machine 
Mg with input alphabet {0, l}. The state graph for Mg is shown in 
Fig. 2. 5. 



Thus, in Mg the input symbol M 0 M can be interpreted as an input or as 
a reset. In Mg the outputs for input 0 are explicitly specified whereas 
in Mj they may be regarded as classical "don't cares. " 
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We can view a particular discrete -time system as a system which 
looks like some machine in one time interval, like in another 
interval, and so on. This is also a good means of specifying a system. 




M 


i T*** 


i 

» 

j 


7 

I 

I 

L 


Time ► 

Fig. 2.6. A Discrete -Time System 

Example 2,2 : Suppose that was implemented as in Fig. 2. 2 and 

that this circuit operated correctly up to time 100 when gate 2 became 
stuck-at-0. What actually existed was not a resettable machine but a 
(time -varying) resettable system S which looks like up to time 100 
and like a different machine, say thereafter. The graph for M£ is 
shown in Fig. 2. 7. 



Fig. 2. 7. Resettable Machine 
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We can represent S as follows: 


S = 


for t < 100 
M* x for t > 100. 


By this we mean that I = 1^ ® I'j and likewise for Q, Z, and R, and that 


0(q*a,t) 

and similarly for X and p. 


6^(q,a) for t < 100 
6’ 1 (q,a) fort > 100 


For resettable systems we take the definitions of S’, A, and 
to be the same as those for systems. It is also convenient in the case 
of resettable systems to specify behavior relative to a reset input r 
that is released at time t, that is, the behavior of S for condition (r, t) 
(r e R, t e T) is the function 


V i+ -* z 

where 

^r,t W * 

If t ® 0, q is referred to as the behavior of S for initial reset r 
and is denoted simply as p . 
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It is useful to extend the behavior function 0 , in a natural 

r, t 

manner to represent the sequence to sequence behavior of S. For 
r € R and t e T 


A 

P 


r,t 


: I 


+ 


where for all a 


1 * 


a n e 1 


*r, t* a l' • • a n* ^r, t* a l*' ' ’ 0 r, t* a l a 2 - ' ' a n* ' 


We will now introduce a few properties of resettable machines 
which will be important to our developing model of on-line diagnosis. 

A more complete treatment of the properties of resettable machines 
can be found in the appendix. 

We define these properties for resettable machines rather than 
for resettable systems because we will be applying them to "fault -free” 
systems, which in this study will always be time -invariant. 

We begin with some concepts of "reachability. " Let M be a 
resettable machine. The reachable part of M , denoted by P, is the 
set 

P = {5(p(r),x)Jr e R, x e I*} . 


M is reachable if P = Q. M is l -reachable if 


P = {6(p(r),x) |r e R, x e I* and lx J < i } . 
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An elementary result of graph theory states that in a directed 
graph with n points, if a point v can be reached from a point u then 
there is a path of length n - 1 or less from u to v. An immediate con- 
sequence of this is that any machine M is (jP | - 1) -reachable. 

Let M, M' € 911(1, Z,R). M is equivalent to M f (written M s M’) 

if /3 f = p' T for all r c R. Two states q € Q and q* e Q* are equivalent 

(q s q’) if /3 = /3* , . It is easily verified that these are both equivalence 
Q Q 

relations, the first on 311(1, Z,R) and the second on the states of machines 
on 311(1, Z, R). 

A resettable machine M is reduced if for all q, q f € P, q = q' 
implies q = q\ A basic result of sequential machine theory states that 
for every machine there is an equivalent reduced machine and that this 
machine is unique up to isomorphism. The corresponding result for 
resettable machines is given in the appendix. 

A concept which is central to sequential machine theory is that of 
a "realization. " The corresponding resettable machine concept will 
be very important to our theory of on-line diagnosis. We will intro- 
duce it by first stating Meyer and Zeigler's definition of realization for 
sequential machines [27 ] . 

Definition 2. 5 : If M and M are sequential machines then M realizes 
M if there ’3 a triple of functions (o^, a 2 > where (I V — > I T is 
a semigroup homomorphism such that a^(I) c I, a y Q Q, 

Oy Z f Z where Z r c Z, such that for all q e Q 
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(L = a 0 0 0 n’a, 

*q 3 H a 2 (q) 1 

It has been shown by Leake [23] that this strictly behavioral 
definition of realization is equivalent to the structurally oriented 
definition of Hartmanis and Stearns [16]. 

If M and M are resettable machines then our definition of 
realization is somewhat different. Inherent in this definition is our 
presupposition that a resettable system will be reset before every use. 

Definition 2. 6 : If M and M are two resettable machines then M realizes 
M if there is a triple of functions (o^, <7g, o^) where o^: (I) — > I is 
a semigroup homomorphism such that cr^(I) ^ I, Og^ R -> R, cr^: 

Z' C Z, such that for all r € R , 



This concept can be viewed pictorally as in Fig. 2. 8. 


R 



Fig. 2.8. M Realizes M under 
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Example 2. 3 : Let Mg and Mg be the resettable machines shown in 

Fig. 2. 9 and Fig. 2. 10. 



Fig. 2. 9. Resettable Machine Mg 



Fig. 2. 10. Resettable Machine Mg 
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*** *** -f* + 

Then Mg realizes Mg under the triple (Oj, erg) where Oy (Ig) -> ^ 

is the identity, a 2 : Rg -> Rg is defined by o^r) = r^, and 

ff 3 : Zg -> Zg is the identity. To verify this claim we need only 

observe that /3^(x) = (x) for allx e (I„) + . 

r r j o 

Notice that the definition of realization for resettable machines 
is less restrictive than that for sequential machines in the sense that 
for resettable machines we only require the realizing system to 
mimic the behavior of the reset states of the realized machine; while 
in the sequential machine case the realising system must mimic the be- 
havior of every state of the realized system. On the other hand, the 
definition in the resettable case is more restrictive in the sense that 
for each reset state in the realized machine not only does there exist 
a state in the realizing machine which mimics its behavior, but we also 
know how to get to that state. 

Before proceeding with our model of on-line diagnosis we must 
introduce a few notational conventions. The identity function on a 
set A will be denoted by e^. When it is clearly understood which 
set is being mapped the subscript will be deleted. 

If Aj, . . . , A fl is a sequence of n sets, its cartesian product is 
the set Aj x . . . x A r = .Xj A. = {(x^, . . . ,x n ) |x. c A., i = 1, . . . ,n}. 
The cartesian product of an empty sequence of sets is taken to the any 
singleton set. 
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n 

Given a cartesian product A = ^ A^, a coordinate projection of A 

is a function P ^ : A A^ defined by P ^ (x^, ...» x Q ) = x^. 

If f , : A-> B f: A -> B is a sequence of functions, the 
1 in n 

n n 

cross-product function ^ f^: A-^ ^ B. is defined by 

xi V a > = (t l (a> 'n (a)) - The cross-product function can be used 

to extend coordinate projections to project on to any subset of coordin- 
ates: if C c{l, ... ,n} then P^:A-> Aj is defined by 
P c = if c P i‘ * n P articular ^ is a constant function with domain A. 
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2. 2 Resettable Systems with Faults 

Our model of a "resettable system with faults" is a specialization 
of Meyer’s general model of a "system with faults" [28]. 


Informally, a "system with faults" is a system, along with 
a set of potential faults of the system and description of what 
happens to the original system as the result of each fault. 

The original system and the systems resulting from faults 
are members of one of two prescribed classes of (formal) 
systems, a "specification" class for the original system and 
a "realization" class for the resulting systems. More pre- 
cisely, we say that a triple (<5“, (ft,p) is a (system) representa- 
tion scheme if 

i) is a class of systems, the specification class , 

ii) (ft is a class of systems, the realization class . 

iii) p: (ft — > e? where, if R e (ft, R realizes p(R). 

By a class of systems, in this context, we mean a class of 
formal systems, i. e. . a set of formally specified structures 
of the same type, each having an associated behavior that is 
determined by the structure [28]. 

In this study we are concerned with the reliable use of a system. 
That is, we are concerned with degradations in structure which Meyer 
calls "life defects. " This is contrasted with reliable design in which 
case we would be concerned with "birth defects. " Thus, in our case, 
a specification is a realization and we choose a representation scheme 
(ft = ((ft, (ft, p) where p is the identity function on (ft. 

Assuming that a faulty resettable system has the same input, 
output, and reset alphabets as the fault -free system S, the following 
class of resettable systems will suffice as a realization class: 

*?(I,Z,R) = {S’|S f = (I,Q’,Z,6’,A ,R,p’)} . 
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In summary, the representation scheme that we are choosing for 
our study of on-line diagnosis is the scheme ( (ft, (ft,p) where 
(ft = «?(I, Z, R) and p is the identity function on (ft. 

hi such a scheme the seemingly difficult problem of describing 
faults and their results becomes relatively straightforward. Before 
we state our particular notion of a fault and its results we will repeat 
here Meyer's general notion of a "system with faults" [28 ] . 

A system with faults in a representation scheme 
(e (ft,p) is a structure (S, F ,</>) where 

i) S € c? 

ii) F is a set, the faults of S 

iii) <£: F -> (ft such that, for some f e F, 
p(<M f)) = S. 

If f e F, the system S* = <fi(f) is the result of f . If p(S*) = S 
then f is improper (by iii), F contains at least one improper 
fault); otherwise it is proper. A realization 3* is fault -free 
if f is improper ; otherwise S* is faulty [28 ] . 

In applying this notion to our study we must first define what we 

mean by a fault of a resettable system. Given a resettable system 

S € «?(I, Z,R), a fault f of S can be regarded as a transformation of 

S into another system S’ e <£(1, Z, R) at some time t. Accordingly, 

the resulting faulty system looks like S up to time r and like S' 

thereafter. Since S may be in operation at time r we must also be 

concerned with the question of what happens to the state of S as this 

transformation takes place. We handle this with a function 9 from 

the state set of S to that of S'. The interpretation of 6 is that if S is 

in state q immediately before time r then S* is in state 0(q) at time 

r. More precisely, 
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Definition 2, 7 : If S € «?(I, Z, R), a fault of S is a triple 

f = (S', t, 9) 

where S f e «?(I, Z, R), re T, and 9 : Q — > Q\ 


A fault f = (S', r, 6) of S is a permanent fault if S* is time invariant. 
We view the occurrence of a fault f = (S’, r, 6) of a system S as 
shown in Fig. 2. 11. 

y i i ■ ■ ■■■■!♦ * * 

I 
4 
I 

... — f 1 1 i 1 « 

Time ► 

Fig. 2. 11. A Fault f = (S’, r, 9) of S 



Given this formal representation of a fault of S, the resulting 
faulty system is defined as follows. 


Definition 2. 8 : The result of f = (S’, r, 6) is the system 

S f = (I,Q f ,Z,6 f ,A f ,R,p f ) 


where Q* = Q UQ* 




r 6(q, a, t) if q e Q and t < r - 1 
( 0($(q, a, t) ) if q e Q and t = r - 1 
I 6'(q, a,t) if q e Q’ and t > r 
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X f <q, a, t) 


f X(q, a, t) if q e Q and t < r 
\x'(q , a, t) if q e Q» and t > r 


{ p(r,t)ift<r 

«(P( r,t) ) if t = r 
P'(r,t) if t > r. 

(Arguments not specified in the above definitions may be assigned arbi- 
trary values. ) 

In justifying this representation of the resulting faulty system one 
should regard a fault f = (S’, r,0) as actually occurring between time 
■ t - 1 and r. Note that, for any fault f of S, S f e cy(I, Z, R). 

Example 2. 4 : Recall that in Example 2. 2 was transformed into 
Mj at time 100. We would say now that f = (M^, 100, e) is a permanent 
fault of and that S is the result of f (i. e. , S = M* ). 

Example 2. 5 : Again consider as implemented by the circuit in 
Fig. 2. 2 and let g be the fault which is caused by d^ becoming stuck-at-1 
at time 50. Then g = (M'^, 50, 0 ) is a permanent fault of where 
is the machine shown in Fig. 2. 12 and 0 : -> Q’^ is defined by the 


table 
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4 

e(a) 

00 

10 

01 

n 

10 

10 

11 

li 


0 



Fig. 2. 12. Resettable Machine M” 

m| will behave as up to time 50 and thereafter it will produce a 
constant sequence of l's. 

To complete the model, a resettable system with faults, in this 
representation scheme, is a structure 

(S, FA) 

where S € «y(I, Z, R), F is a set of faults of S including at least one 
improper fault (e. g. , f = (S, 0, e)), and 0: F -> <5*(I, Z, R) where 0(f) = 
S*, for all f e F. Given this definition, we can drop the explicit refer- 
ence to 0 in denoting a resettable system with faults, i. e. , (S, F) will 
mean (S, F, 0) where 0 is as defined above. 
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In the remainder of this study we will be dealing exclusively with 
resettable systems. Thus we will refer to resettable systems simply 
as systems and to resettable machines as machines. 

A word is in order about our definition of faults. The interpreta- 
tion here is one of effect, not cause, e. g. , we don’t talk of stuck -at -1 
OR gates but rather of the system which is created due to some presumed 
physical cause. We will refer to these physical causes as component 
failures or simply as failures. A fault, by our definition, consists of 
precisely that information which is needed to define the system which 
results from the fault. This allows us to treat faults in the abstract; 
independent of specific network realizations of the system and without 
reference to the technology employed in this realization and the types 
of failures which are possible with this technology. We are insured, 
however, that for each fault we have enough information to assess the 
structural and behavioral effects of the fault; in particular as these 
effects relate to fault diagnosis and tolerance. 

There are limits, however, to how much can be done with a purely 
effect oriented concept of faults. When a system is sufficiently structured 
to allow a reasonable notion of what may cause a fault we certainly will 
want to make use of this notion. When this is the case we may, through 
an abuse in language, refer to a specific failure at time r as a fault. 

What we will mean is that we have stated a cause of fault and that there 
is a unique fault which is the result of this failure at time r. 
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It is interesting to see what the scope of our definition of fault is 
in terms of the types of failures which will result in faults. Recall that 
a fault f of a system S is a triple, f = (S’, r, 0), where S f £ «?(I, Z, R). 

Thus S’ is a (resettable) system with the same input, output, and reset 
alphabets as S. The previous sentence contains, implicitly, every 
restriction that we have put on faults. First of all, S' is (resettable) 
system. Thus it remains within our universe of discourse. In parti- 
cular, its reset inputs still act like reset inputs. That is, they cause 
S’ to go into a particular state regardless of the state it was in when the 
reset input was applied. The restrictions on the input, output, and re- 
set alphabets are reasonable since after a fault occurs the system 
presumably will have the same input and output terminals as it had be- 
fore the fault occurred. 

Let f = (S\t, 9 ) be a fault. Because S f may vary with time we have 
considerable latitude in the types of failures which we may consider. 

In particular, we may consider simultaneous permanent failures in one 
or more components, simultaneous intermittent failuies in one or more 
components, or any combination of the above occurring at the same or 
varying tir.i^s. For example, a fault f may be caused by an AND gate 
becoming stuck-at-1 at time t«, followed by an OR gate becoming stuck- 
at-0 at time Tg. 

Let us now compute the behavior of in state q. Let x = a^. . . a 
e I + . Then 
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/^(x,t) = A f (q,x,t) 

X.f 


* A (fi (q, a r . . a n l , t), a Q , t + n - 1) 


There are three cases which must be considered. 

Case i) q c Q and t + n - 1 < t . Then 

= A(5(q, a r • • a n _ 1 »t),a n ,t + n-l) 

= /3 q (x, t) . 

Case ii) q e Q, t + n - 1 > r, and t < r. Say t + n - m = r. 
/3^(x,t) = A , (6 , (^(6(q,a r ..,a n-m ,t)),a n _ m+r ..a n _ 1 , 


t + n -m),a n ,t + n -1) 


^0(6(q, a.y . . a n _ m , t))^ a n-m+l* • • a n > t + n “ m ) 


= ' 3 a(6(q,y,t)) (2 ' T) where y = a r" a „-m 


and z = a n-m+l* ' • a n • 


Case iii) q e Q* and t >r. Then 

/Cx,t) = i t)> a n , t + n ~ 1) 


= /3 q (x,t) . 


Then 


Thus we have proved: 
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Theorem 2. 1: Let S be a system and f = (S', r,0) a fault of S. Then for 
each t £ T and x £ I + 

I' 0q(x, t) if q £ Q and t + Jx J < r 

^(5(q,y,t)) (z ’ r) ifqe Q ’ t + l x l > T ’ and 

t < r where x = yz and |y | = r - t 

^ /3^(x,t) if q e Q' and t _> r. 

(As in the definitions of 6* and A* arguments not specified may be 
assigned arbitrary values. ) 



Corollary 2. 1. 1 : Let S be a system and f = (S', r, 9) a fault of S. Then 
for each r e R, t £ T, and x £ I + 

r 

/3 r t (x) if t + Jx |< r 

l w . I ^(6<p(r,t),y,t)) (z ' T)itt+ 1*1 >*-«-« 

’ 1 t < t where x = yz and 

|y I - r - 1 

Proof : By its definition 

.(x) = (x,t) . 

F,t P f (r,t) 


Again we have three cases to consider. 
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Case i) t+Jx|<r. Then t < r and p*(r,t) =p(r,t) e Q. 
Therefore by Theorem 2. 1 

p f (r,t) p{x ' i} 


Case ii) t + |x| > r and t < r. If t < r then p*(r,t) = p(r,t) e Q 
and Case ii) of Theorem 2. 1 applies withp(r,t) in place of q. If 
t = r then p*(r, t) = 0(p(r,t)) e Q' and case iii) of the theorem 
applies giving us 


(x,t) 

P*(r, t) 


.^e(p(r,t)) 


(x,t) 


Case iii) t > t. 


-^(6<p(r,t),A,t)) (x ’ t) - 
In this case p^(r, t) = p'(r,t) e Q’. 


Therefore 


■ *,t W ‘ 

We have noted that we will often be interested in the physical cause 
of ?i fault. For example, in a network realization of a machine we may 
be interested in faults which are caused by a specific NAND gate be- 
coming stuck -at-1. Since this gate failure results in different faults 
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as we consider it occurring at different times it seems natural to give 
a name to this family of faults. More generally, we will define an equi 
valence relation on a set of faults such that a family of faults such as 
we have just mentioned will be an equivalence class. 

First we must define an equivalence relation on «?(I, Z,R) such 
that two systems S, S’ e «?(I, Z,R) are equivalent if they are identical 
except for a shift in time. 

Definition 2.9 : Let S, S' € «?(I, Z,R). S' is a r -translation of S if 

Q * Q' and for all a e I, re R, and t c T 

i) 6(q, a, t) = 6'(q, a,t+r) 

ii) A(q,a,t) = X'(q,a,t+T) 

iii) p(r,t) =p'(r,t+r) . 

If S' is a r-translation of S then it can be shewn that for all q e Q, 
r e R, x £ I + , and t e T 


and 


0 (x,t) = #(x,t + t) 


*.t« - 


F . (x) 

r,t+r v 


Definition 2. 10: Let (F, S) be a system with faults and let f ^ = (S^, r^, 0^) 
and fg = (S 2 > t 2 > ^ Then is equivalent to fg (fj = f 2 ) if Sj 

is a (Tj - Tg) -translation of Sg and 0 ^ = 9^. 
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Theorem 2. 2: The above relations are equivalence relations. 

Proof : The relation of m t - translation" is an equivalence relation on 
«y(I, Z, R) because "=" is an equivalence relation. The relation "= " on 
a set of faults of a system is an equivalence relation because "r-trans- 
lation" and "=" are both equivalence relations. 


Notation : We denote then equivalence class of F which contains the 

fault f = (S, t, 9) by [f] j,. When the class of faults is clear we will drop 

the F. Generally if F is not mentioned we take it to be the set of all 

possible faults of a system S. We let f. = (S., i, 0) denote the fault in 

[f] which occurs at time i. When dealing with behaviors $ 1 will denote 
f i i 

the behavior of S , and /3 will denote the behavior of S.. 

Let f. = (S., i, 6) and f. = (S. , j, 9) be equivalent faults of a machine 
— * 3 1 

M. Since M is a (i-j) -translation of itself, it can be verified directly 

fi L 

from Definition 2. 8 that M is a (i-j) -translation of M . Hence, 


Theorem 2. 3 : Let f be a fault of M and let f., f. € [f] . 

l ] L J 

q € Q, x e I + , r e R and t e T 


fhen for all 


f. f. 

0 1 (x,t+i) = /3 J (x,t+j) 

4 4 


f. f. 

/3 1 .(x) = /3 * .(x) . 

'r,t+i ^r,t+] 


and 
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In this section we nave defined and studied the notion of a fault 
of a system. In the remainder of this study we shall limit our investi- 
gations to the case in which the fault -free S3/stem is time -invariant. 
That is, we shall be studying faults of machines. If f = (S', r, 6 ) is a 
fault of a machine M we will allow S’ to vary with time. 
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2. 3 Fault Tolerance and Errors 

Given a system with faults (S, F) and a proper fault f c F, an 
immediate question is whether the faulty system S* is usable in the 
sense that its behavior resembles, within acceptable limits, that of the 
fault -free system S. We will use the general notion of a ’’tolerance 
relation” [ 28] to make more precise what is meant by "acceptable 
limits. *’ A tolerance relation for a representation scheme (S,<R,p) is 
a relation r between (R and «? (t c (R XtS*) such that, for all R e <R, 
(R,p(R)) e t (i.e. , p c t). In this section we will develop the particu- 
lar notions of "acceptable limits" that we will be using in this study of 
on-line diagnosis. 

Given a machine M it will be understood that M realizes a specific 
reduced and reachable machine M under the triple Under 

the intended interpretation, M serves as the specification of some 
desired behavior and M serves as the fault -free realization of this 
behavior. This relationship between M and M will underline our basic 
notions of fault tolerance, error and on-line diagnosis. 

In this study we will only be concerned with the behavior of M 

under those resets and inputs which correspond via Oj and a ^ to resets 

and inputs of M. No requirements will ever be put on /3 (x) or / . (x), 

r r, t 

where f is a fault of M, if r i or x i o^(I + ) because these are 
considered to be "non-code space resets" and "non-code space inputs. " 
For this reason we will always assume that and a ^ are onto . In 
actually dealing with machines for which or a ^ is not onto, occurrences 
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of ”non-code space resets” and ”non-code space inputs” could be 
* ignored or they could be treated as errors which must be detected. 
These two options correspond to Carter and Schneider’s [ 7 ] Don’t 
Care Assignments 1 and 2. 

We will be using two basic notions of fault tolerance. The first, 
and weaker, corresponds to the preservation of the behavior of M 
only insofar as its mimicing of M is concerned. 


Definition 2. 11 : Let f be a fault of a machine M. Then f is 1 -tolerated 

by M for resets at time t if for all r e R 


6~ = a„ o 3 ° a. 

r 3 <? 2 (r),t 1 

Alternatively, since cr^ and are onto and since i3~ = 

or 0 « a., f is 1 -tolerated by M for resets at time t if for 

o a 2' 1 ' 1 

all r € R 


°3^r * ff 3 ° ^r,t 


In the special case where f is 1 -tolerated by M for resets at time 
0, we will simply say that f is 1 -tolerated by M. 

The second, and stronger, notion of tolerance does not allow for 
the tolerance of any change in behavior. 

Definition 2. 12: Let f be a fault of a machine M. Then f is 2 -tolerated 


by M for resets at time t if for ail r e R 
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Again, f is 2 -tolerated by M if it is 2 -tolerated by M for resets 
at time 0. 

Our definition of 1 -tolerated induces a relation on (ft where 
M* Tj M if and only if f is 1 -tolerated by M. If f is improper then M* = 

M and thus f is 1 -tolerated by M. Hence M M, and therefore is 
a tolerance relation. Likewise 2 -tolerated induces a tolerance relation 
Tg. If f is 2 -tolerated by M then we can see that f is 1 -tolerated by M. 
Hence, as sets, r ^ c. Tj. Finally, note that if a 3 is 1-1 and f is 
1 -tolerated by M then f is 2 -tolerated by M. 

Example 2.6 : Let M be the realization of M which consists of 3 copies 

of M, a voter, and a disagreement detector as shown in Fig. 2. 13. Then 
any fault f which affects only one copy of M is 1 -tolerated but may not 
be 2 -tolerated, and its presence may be detected by the disagreement 


detector. 
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Machine M 


Fig. 2. 13. Triple Modular Redundancy with Voting 
and Disagreement Detecting 

Our definitions of 1 and 2 -tolerated by M for resets at time t 
are refined notions of fault tolerance. Coarser notions, and ones more 
in keeping with the literature, would be behavioral equivalence for 
resets at any time. We prefer our finer definitions for with them the 
effects of time can be more naturally analyzed. One question which 
we will study later is: For resets at how many (and which) times must 
a fault be tolerated for it to be tolerated for resets at any time? 

When a discussion or theorem applies equally well to 1 -tolerated 
and to 2 -tolerated we will just use the general term "tolerated. " We 
also do this latter in this section when we discuss "errors. " 
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Theorem 2. 4 : Let f = (S\r, 0) be a fault of machine M. Then f is toler- 
ated by M for resets at time t if and only if f^ is tolerated by M. 

f r f r-t f r f r-t 

Proof: By Theorem 2. 3, /3_ A = /3 _ n . Hence, o, « p = cr « p 

t r,t r,u 6 r,t o r,u 

T *T -t 

and ff 3 = ff 3 ’ ^ t if and only if a 3 0 = ff 3 0 & T o * This 

establishes the result. 


Thus, f., f., f^, . . . are tolerated by M for resets at times t j, tg, tg... 
respectively if and only if f. . ,f. , ,f, . ,. . . are tolerated by M where 

i-tl ] <2 K-t g 

by F is tolerated by M we mean that each f e. F is tolerated by M. Due 
to this we will always consider resets to be released at time 0 when 
dealing with fault tolerance of machines and no generality will be lost. 
Clearly, due to Theorem 2. 3, this same sort of time translation can be 
applied to any other behavioral attribute. 


Example 2. 7 : Let be the sequence generator shown in Fig. 2. 14. 

This machine could be implemented by the circuit shown in Fig. 2. 15. 


0 
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Fig. 2. 15. Circuit for 

Let f be a fault of which is caused by becoming stuck-at-1 at 
time r. Then f = (M^, r, B) where is the machine represented by 
the graph in Fig. 2. 16 and B is as indicated below. 


q 

0(q) 

00 

10 

01 

11 

10 

10 

li 

11 



Fig. 2. 16. Machine 





49 


1 i 

Consider f_j, i. e. , the fault (M^, -1,0), and note that ( 3 q (11) = 1 

whereas /3 q( 11) = 0. Thus f ^ is not 2-tolerated by On the other 
hand both and will produce the sequence 00010101. . . when 
reset at -10. Thus f j is 2-tolerated by for resets at -10. By 
applying Theorem 2. 4 we can learn that L is not 2-tolerated by 
for resets at time i + 1 and that f^ is 2-tolerated by 

Corresponding to our two types of fault tolerance we can define 
two types of errors. 

Definition 2. 13 : Let M be a machine, r e R, x e I + , and y e Z + where 
jx| = |y J. The triple (r,x,y) is called a 1 -error ( 2 -error) of M if 

a 3 $ r ( x )) ^ a 3 (y) $ r W ^ y)* 

If (r,x,y) is an error of M and f is a fault of M for which 
/3*(x) - y then we say that the fault f causes the error (r,x,y). Note 
that any given error could be caused by many different faults. 

The relation between fault tolerance and errors is very simple. 

A fault f is tolerated if and only if it causes no errors. The relation 
between 1 -errors and 2 -errors is also straightforward. Namely, 
every 1 -error is a 2 -error, and if is 1-1 then every 2 -error is a 
1 -error. Errors are very important in any study of fault diagnosis 
because a fault can never be detected until it causes an error. The general 
goal of on-line diagnoses is protection against undesirable behavioral 
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manifestations of faults, i. e. , for protection against errors. 

An error (r, ua, vb) where a € I and be Z is a minimal error if 
(r, u, v) is not an error. If (r,x,y) is a minimal 1-error then it is : 
2-error but not necessarily a minimal 2-error. This notion of minimal 
(or first) errors will be central to our notion of diagnosis. A minimal 
error (r,x,y) is said to occur at time Jx J - 1. This is the time at 
which the last symbol in y is emitted. 

Often we will be in a situation where we are concerned with a 
machine M tolerating a set of faults which are all caused by the same 
phenomenon but which may occur at any time. More specifically, let 
f b. a fault of M. We would like results which assured us that if some 
finite subset of [f] was tolerated by M then all of [f] was tolerated oy 
M. Later we will be interested in the same problem with regard to 
diagnosis. 

Our first result of this nature hinges on the fact that any reachable 
state of an £ -reachable machine is reachable by time £. 

Theorem 2. 5 : Let f be a fault of an £ -reachable machine M and suppose 
fj is tolerated by M for 0 < i < l. Then L is tolerated by M for all 
i>°. 

Proof : Assume, to the contrary, that L is not tolerated by M for some 
i > £. Then there exists an error (r,x, y) which is caused by f.. 

Hence $ 1 (x) = y. Let x = XjXg and y = y^y 2 where |x^ | = |y^ | = i. 
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By Corollary 2. 1. 1 we know that 

= =y l y 2 * 

Let q = 5(p(r),Xj). Since M is L -reachable, there exists s € R and 
u € I + such that |u| = j <L and 6(p(s),u) = q. By Theorem 2. 3 

a f. 

^0(q)^ X 2’ ^ = ^(q)^ x 2*^‘ Therefore % ( u ) = v then (ux 2 ) = 

V u) ^(6(p<s),u)) (x 2’ ^, = v ^(q) <x 2> i, = vy 2- Clear *y. <s,ux 2 ,vy 2 ) 

is an error and it is caused by f.. Therefore f. is not tolerated. 

. J 

Contradiction. This establishes the result. 

The following general example shows that Theorem 2. 5 is the 
strongest result possible, in the sense that if the hypothesis is at all 
weakened then there exists a fault f and a machine M for which the 
conclusion is invalid. 

Example 2. 8: Consider the i -reachable autonomous machine shown 

in Fig. 2. 17. Let m be an integer between 0 and i , and let 
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f = (M^,r, 9) be a fault of where 


r q. if j / m 

i'M 

^ 6(q.,0) if j = 


e(q { ) = 


m 


Consider to be realizing itself. That is, take M = M^. 

The occurrence of f = (M^, r, 9) has an effect on the behavior of 

M. if and only if M„ could be in state q at time r. Therefore, f. = 
a £ m l 

(M^, i, 9) is tolerated by if and only if i / m (mod £ + 1). Hence 

f. is tolerated by M. r for i = 0, . . . , m-1, m+1, £ does not imply f. 

is tolerated by for all i > 0. Since both.m and £ were arbitrarily 
chosen, this general example shows that the hypothesis of Theorem 2. 5 
cannot be weakened. 

Let us now look at faults which occur before time 0. In the 

previous result we have not mentioned this case because if L and f. 

are equivalent faults and i or j is less than 0 then there is, in general, 

f i f i 

no relation between the behaviors of M and M J for resets re eased at 
time 0. However, in the important special case where f = (M’,t, 9) 
is a permanent fault, any f. e [f] with i < 0 will, with respect to resets 
released at time 0, cause identical behavior. 


Lemma 2.6 : Let f = (M', r, 0) be a permanent fault of M. Then 

f i f i 

/3 = 0 J for all r € R and i, j <0. 

r r J 
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Proof: Let i,j < 0. Because f is permanent, f. = (M\i, 0) and 

f i 1 f i 

f. = (M\ j, 0). By Corollary 2. 1. 1, ^ = j? and = % for all r e R. 
This establishes the result. 

Theorem 2. 7: Let f be a permanent fault of an l -reachable machine M. 

If fj is tolerated by M for -1 < i < i then f. is tolerated by M for ail 
i € T. 

f i f -l 

Proof : By Lemma 2. 6, ^ = p~ l for all i < 0. Hence, f_^ is tolerated 
by M implies that L is tolerated by M for all i < 0. By Theorem 2. 5, 
f. is tolerated by M for all l > 0. This establishes the result. 

Before leaving this line of development we will make some final 
observations. Note that a machine M is 0-reachable if and only if 
p(R) = P. In particular, every memoryless machine is 0-reachable. By 
Theorem 2. 5, if M is 0-reachable and f is tolerated by M then f. is 
tolerated by M for all i > 0. 

If f = (M\ t, 0) is a fault of M we think of f as affecting the reset 
mechanism of M if p'(r) f 0(p( r)) for some r e R. If this is not the case 
then a further result, similar to Lemma 2. 6 can be obtained. 

Lemma 2. 8: Let f - (M’,r, 9) be a permanent fault of M and suppose 

f. f. 

that p'(r) = 8(p(r)) for all r e R. Then /3 f l = pj for ail r e R and i, j <0. 

Proof : Since p’( r) = 0(p(r)), by Corollary 2. 1. 1, P r ° = (3* for all r c R. 
The result now follows just as in the proof of Lemma 2. 6. 
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Putting the above observations together yields: 

Theorem 2. 9 : Let f = (M’, r, 6) be a permanent fault of M. Suppose 

that p'(r) = 6(p( r)) for all r € R and that p(R) = P. If f . is tolerated by 
M for any i < 0 then fj is tolerated by M for all i € T. 

Proof : By Lemma 2. 8 f. is tolerated by M for all i < 0. Since p(R) = 

P, M is 0-reachable. Therefore, by Theorem 2. 5 f^ is tolerated by M 
for all i > 0. This establishes the result. 
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2.4 On-line Diagnosis 

Our notion of on-line diagnosis of a system involves an external 
detector (assumed to be fault -free) which observes the input and the 
output of the system and makes a decision as to whether the behavior 
of the system is within "acceptable limits" as set forth by our notions 
of fault tolerance. Initial synchronization of the system with its 
detector is achieved by using the same reset to initialize both systems. 

The formal relation between a system and its detector is that of 
a "cascade connection. " 

Definition 2, 14 : The cascade connection of two systems and S 2 for 
which Rj = I ?2 and I 2 = x 1^ is the system 

S 1 * s 2 = (I r Q,Z 2 ,6,A,R r p) 

Q = Qj x Q2 

where 

5((q r q 2 ),a,t) = ( 6 1 (q 1 »a,t),6 2 (q 2 , (A 1 (q 1 ,a,t),a),t)) 
M(q 1 ,q 2 )»a,t) - A 2 (q 2> (A 1 (q 1 , a, t), a), t) 
p(r,t) = (p 1 (r,t),p 2 (r,t)). 
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Schematically, * Sg can be pictured as in Fig. 2. 18. 



Fig. 2. 18. The Cascade Connection of and Sg 


Notation: If u = z.z 0 . . . z e Z + and v = a,a„. . . a e I + then the pair 
l l n I & n 

[u, v] will denote the sequence (z^, a^)(Zg, ag). . . (z^, a R ) e (Z x I) + . 


1 2 

Let * Sg be the cascade connection of with Sg. Let 0 , ^ , 
and / 3 * denote the behavior functions of S^, Sg, and Sj * Sg respectively. 
It can be shown directly from the definition of a cascade connection that 
for all x e I*, e Q, e Qg, r e R. , and t e T, 


% 


r q 2 ) 


(x,t) 


([$J (x,t),x],t) 
q 2 q l 




and 




57 


We can now formally define our notion of on-line diagnosis. 

Definition 2. 15 : Let (M, F) be a machine with faults , let D be a machine 

for which M * D is defined, and let k be a nonnegative integer. (M,F) is 

(D, k)-l-diagno sable ( 2 -diagno sable) if 

i) /3* = 0 for all r e R, and 
r 

ii) if (r,x, w) is a minimal 1-error (2-error) caused by some f e F 
then 

^J?([ xy]) f 0^1 for all y £ I* with |y | = k . 

f 

Thus, the detector D ooserves the operation of M and must mako 
a decision based on this observation as to whether an error has occurred. 
Note that the fault -free realization M and the detector are both time- 
invariant (i. e. , machines), and that the detector takes no part in the 
computation of M’s output. 



Fig. 2. 19. Diagnosis of (M, F) using the Detector D 
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The two conditions of Definition 2. 15 can be paraphrased as: 
l) D responds negatively if no fault occurs; i. e. , D gives no 
false alarms, and 

ii) for all f € F, D responds positively within k time steps of the 
occurrence of the first error caused by f. 

Condition i) implies 0 e Z^, the output alphabet of D. Each 
z € Zp other than 0 is called a fault -detection signa l. The choice of the 
symbol ”0” to indicate that the machine M is operating properly is 
purely for notational convenience. In general we could let any subset 
of Zp indicate proper operation and let the complement of this set in 
Zjj be the set of fault -detection signals. In a practical application this 
choice would depend on the design constraints on the detector. 

As we have done with fault tolerance and with errors, if a theorem 
or remark applies to both "1-diagnosable" and ’’2-diagnosable" we will 
just state it once using the general term "diagnosable. ” 

Let D be a detector for M. Then 1^ = Z x I. There will be times 
when the observation of M’s input by D will be unnecessary or undesired. 
If for all z e Z and a, b e I (z,a) and (z ,b) are equivalent inputs of D 
then we will say that D is independent of M’s input. In this case the 
behavior of D does not depend on the second coordinate of D’s input and 
we will take 1^ to be simply Z. 

Recall that with this concept of diagnosis that we are only con- 
sidering faults of M. Faults of D must be analyzed separately. In 
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finding a realization M of M and a detector D there is some leeway in 
how much of the added complexity required for diagnosis should go 
into the detector and how much should go into the realization. If it all 
goes into the realization then D will serve only to select out certain 
coordinates of M's output to be used as the output of D. That is, D 
will be memorvless and realize a projection. In this case we will 
say that (M, F) is k-self-diagnosable . In general, it is desirable for 
the desirable for the detector to be self -diagnosable for some suitable 
set of faults. 

The basic on-line diagnosis problem can be stated as follows: 

Given a machine M, a class of faults F, a class of^ detectors 
and a delay k find an (economical) realization M of M and a 

detector D eQ) such that (M, F) is (D, k) -diagnosable. 

In this chapter we have developed a model for the study of on- 
line diagnosis of resettable machines, and we have stated the basic 
on-line diagnosis problem. We end this chapter by stating some funda- 
mental questions, the answers to which will help solve this basic pro- 
blem. We will begin to answer these questions in the following chapters. 

I. Given M, M, and F, does there exist a detector D and a 
delay k such that (M, F) is (D, k) -diagnosable ? 

II. If such a D and k exist, how does one construct an optimal or 
near -optimal detector? What might be criteria for optimality? 

III. What time -space tradeoffs are possible between the added 
complexity needed for diagnosis and the maximum allowable delay? 

We expect that there will be situations where if the detector is given 
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additional time in which to indicate an error then diagnosis may be 
simplified. 

IV. What are good on-line diagnosis techniques? When is each 
technique applicable? How does one compare techniques? 

V. What relationships exist between faults and errors? Given 
M and F, what errors are possible? Given M and F, how can one find 
a realization M of M such that the machine with faults (M, F) gives rise 
only to errors of a given type? These are important questions 
because given a diagnosis technique or a particular type of detector, 

it will often be easy to determine just what types of errors are 
detectable. The faults that are diagnosable will then have to be inferred 
from this information. Conversely, we will want to find realizations 
such that the faults we are concerned with will cause errors that we 
can detect. 

VI. What properties of system structure and system behavior are 
conducive to on-line diagnosability ? Structural properties are 
important for it is expected that they will relate directly to diagnosis 
techniques. Behavioral properties could be used to measure the 
inherent diagnosability of a given behavior in terms of the minimum 
added complexity which would be required to obtain a given level of 
on-line diagnosis. 



CHAPTER III 


General Properties of Diagnosis 

In this short chapter we will present a few results on diagnosis 
per se. That is. they are general results which tell us some things 
about diagnosis, independent of the particular fault set being diagnosed 
or of any particular diagnosis technique. In the following chapters 
we look at the diagnosis of specific sets of faults and investigate 
the capabilities and limitations of on-line diagnosis techniques. 

It is interesting to see how our concept of on-line diagnosis 
compares with a similar concept introduced by Carter and Schneider 
[ 7 ] and called "fault -secure” by Anderson [ 1 ]. As stated by 
Anderson, "A circuit is fault -secure if, for every fault in a pre- 
scribed set, the circuit never produces incorrect code space outputs 
for code space inputs. " 

Before making a formal comparison we must translate this 
notion into our framework. In doing so we will strive to be faithful 
to Anderson's intent. 

Definition 3. 1 : A machine with faults, (M, F), is fault -secure if 
(r,x,ya), where a e Z, is a minimal 2-error caused by some f e F 
implies a i { >3^ (x) f r e R, x e I + }. 
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Thus if (M, F) is fault -secure then a combinational detector which 
only observes the output of M can detect all minimal 2 -errors. More 
formally, 

Theorem b. 1: (M, F) is fault -secure if and only if (M, F) n (D, 0)-2- 
diagnosable where D is memoryless and independent of M’s input. 


Proof : (Necessity) Assume that M is fault-secure. Define 
X D : Z -> {0, 1} by 

f 0 if z e {B (x) |r e R, x e I + } 

x D ( Z ) = ( 

v 1 otherwise 


Let D be the memoryless detector which realizes A^. Then D is 
independent of M’s input and it can easily be verified that (M, F) is 
(D, 0)-2-diagnosable. 

(Sufficiency) Assume that (M, F) is (D, 0)-2-diagnosable where D is 
memoryless and independent of M's input. Let A^: Z — > {0, 1} 
denote the function realized by D and let Z' = {/3 f (x) jr e R, x e I + }. 
Then A^(z) = 0 for all z e Z' for otherwise a false alarm could occur. 
Let (r,x,yu) where a e Z be a minimal 2-error. If a e Z' then 
Ap(a) = 0 and f is not detected without delay. Therefore a { Z\ 

Hence (M, F) is fault -secure. 


Thus the concept of (D, k)-diagnosable is a generalization of the 
concept of fault -secure. In particular, (D, k) -diagnosis allows for 
(i) different tolerance relations, (ii) nonzero delay in diagnosis, 
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(iii) detectors with memory, and (iv) explicit observation by the 
detector of the input to the system being monitored. 

The following result is a consequence of the fact that every 

1 - error is a c -error but not conversely. 

Theorem 3. 2: If (M, F) is (D, k)-2-diagnosable then (M, F) is (D, k)-l- 
diagnosable, but not conversely. 

Proof : Let (M, F) be (D, k)-2-diagnosable. Then • i false alarms will 

occur and every minimal 2 -error will be detected within k time steps 

of its occurrence. Let (r,x,,y) be a minimal 1-error. Thena„C£ tx)r 

o r 

(^(y) and hence /^.(x) /=y. Thus (r,x^,y^) is a minimal 2-error for 
some x^ and y^ such that x = x^ and y = y^g. Since this minimal 

2 - error is detected within k time steps of its occurrence the minimal 
1 -error (r,x,y) must also be detected within k time steps of its 
occurrence. Hence (M, F) is (D, k)-l-diagnosable. 

The counterexample which shows that the converse does not 
hold is given in the next chapter in the proof of Theorem 4. 4. 

Due to this result, in stating theorems "l-diagnosable” is a 
weaker hypothesis than M 2-diagnosable. " 

Although the converse of Theorem 3. 2 does not hold in general, 
the following partial converse can be obtained. 
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Theorem 3. 3: If (M, F) is (D, k)-l-diagnosable and is 1-1 then 

(M, F) is (D, k)-2-diagnosable. 

Proof : We observed in Section 2. 3 that if a ^ is 1-1 then every 

2-error is a 1-error. The result is an immediate consequence of 
this fact. 

The next result will help us to see the relationship will help us 
to see the relationship between fault diagnosis and fault tolerance. 

Theorem 3. 4: Let (M, F) be a machine with faults. If F is tolerated 
by M then (M, F) is (D q , 0)-diagnosable where D q is a trivial memory - 
less machine which realizes the constant 0 function. 

Proof : Condition i) is clearly satisfied, and condition ii) is satis- 

fied because if F is tolerated by M then no f e F will cause any errors. 

The decision in this case can be trivially made since no errors 
are ever produced. The situation for tolerated faults is not so simple 
as this result may seem to indicate for it must be remembered that 
1 -tolerated does not imply 2 -tolerated and thus a 1 -tolerated fault 
could be detected through a 2 -error. 

We will now develop some results concerning diagnosis which are 
analogous to Theorems 2. 5, 2. 7 and 2. 9. Recall that these theorems 
allowed us to infer the tolerance of an infinite set of equivalent faults 
from knowledge that a specific finite subset of them is tolerated. 
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Theorem 3. 5 : Let M be a machine and let D be a detector for M. 
Suppose that the cascade connection M * D is H -reachable, and that 
f is a fault of M. If (M, {f }) is (D, k)-diagno sable for 0 < i < i then 
(M,{f^}) is (D, k) -diagnosabie for all i > 0. 


Proof : Assume that (M, {f^) is (D, k) -diagnosabie for 0 < i < L. 
Then condition i) of Definition 2. 15 is immediately satisfied. Let 
(r,x, w) be a minimal error caused by f. where i > £, and let u e I + 
with |u j = k. To show that (M, {f.}) is (D, k) -diagnosabie for 0 < i 

f. 1 II 

we need only show that ([ (xu),xu] ) / 0 ' xu 

Let x = XjZ where |x^j = i, and let 6 *(p*(r),x^) = (q,q*). Since 

M * D is £ -reachable there exists s e R and y e I + with 0 < (y | <£ 

such that 6 *(p*(s),y) = (q,q’). Say ly j = j. Since is 

(D, k) -diagnosabie, $*"*([ p i(yzu),yzu]) ^ 0 ^ and since the fault 

s s 

detection signal must occur after the fault occurs, 


#^([$ 0 (q) (zu,j),zu]) / ol zu l . 

A f i A f j 

Now by Theorem 2.3, p 0 ^(zu, i) = /3g^(zu,j) and hence 
£ 

££?([ $ e (zu, i), zuj ) ^ 0 1 zu L Therefore 


£ ^ £ 

^([r 1 (XjSul.XjZuJ) = ftP([ 0 T ' (x 1 ),x 1 ] )£°([ (zu, i), zu] ) 


* ol* u l 


Hence (M, {l}) is (D, k) -diagnosabie for all i > 0. 
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Example 2. 8, which shows that the hypothesis of Theorem 2. 5 
cannot be weakened, works likewise for Theorem 3. 4. This example 
works for both fault tolerance and fault diagnosis becuase, as was 
pohued out by Theorem 2. 3, tolerated faults are trivially diagnos- 
able. 

Theorem 3. 6: Let M be a machine and let D be a detector for M 

such that M * D is j£ - reachable. If f is a permanent fault of M and 
(M,{f.}) is (D, k) -diagnosable for -1 < i < £ then (M, {f.}) is 
(D, k) -diagnosable for all i e T. 

Proof : Assume that f is a permanent fault and that (M, {f.}) is 

(D,k) -diagnosable for -1 < i < £. By Theorem 3.4, (M,{l}) is 

fi f_l 

(D, k) -diagnosable for all i > 0. By Lemma 2. 6, 
for all r e R and i < 0. Hence ev ry L with i < 0 will cause 
exactly the same errors. Since (M, {f ^}) is (D, k) -diagnosable it 
follows that (M, {r}) is (D, k) -diagnosable for all i < 0. This 
establishes the result. 

Let D be a detector for a machine M. It will often be the case 
that the second coordinate of the state of M * D can be uniquely 
determined from the first coordinate. In particular, this is always 
the case when jQ^j = 1. More formally, the case? connection of 
with M 2 is synchronized if there exists a function h: Qg 
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such that for each (qj,qg) in the reachable part of * Mg, 
h(q^) = qg. Such a function is called the synchronizing function of 
* Mg and it must satisfy h(pj(r)) =Pg(r) for each r e R. 

If M * D is synchronized and M is £ -reachable then M * D is 
also £ -reachable. We have observed in Chapter II that M is 
O-reachable if and only if p(R) = P, and that, in particular, every 
memory less machine is O-reachable. Hence if p(R) = P and M * D 
is synchronized then M * D is O-reachable. In this case we know 
that if f Q is diagnosable then f. is diagnosable for 0 < i. 

We terminate this line of development by stating the strongest 
result of this nature. 

Theorem 3. 7 : Let M be a machine for which p(R) = P. Let D be 
a detector for M such that M * D is synchronized. Let f = (M\ t, 9) 
be a permanent fault for which p’(r) = 0(p{ r)) for all r e R. If 
(M,{l}) is (D, k) -diagnosable for any i < 0 then (M, {f.}) is (D, k)- 
diagnosable for all i e T. 

Proof : Assume that (M, {f^}) is (D, k) -diagnosable where £ < 0. 
f i f i 

By Lemma 2. 8, fi^ = fi ^ for all i, j < 0. Therefore (M, {l}) is 
(D, k) -diagnosable for all i < 0. Since p(R) = P and M * D is syn- 
chronized, M * D is O-reachable. Thus by Theorem S. 4, (M, {f.}) 
is (D, k) -diagnosable for all i > 0. This establishes the result. 
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Diagnosis of Unrestricted Faults 

With rapidly changing technology it is risky to rely too heavily 
on the classical stuck -at model of circuit failures. Other failure 
modes such as bridging failures have been proposed and studied 
(see (26] and [15] for example) but little is known about the diag- 
nosis of such failures. Intermittant and multiple failures are also 
possible. Adequate failure mode analysis often exists only for out- 
dated technology. 

There are other problems in obtaining a suitably restricted set 
of faults which are peculiar to on-line diagnosis. For a given 
failure it may be impossible to determine the 0 function of the fault 
caused by this failure. Thus fault sets which do not restrict the 
fault mapping 0 are advantageous. 

In this chapter we will develop some basic results concerning 
the diagnosis of "unrestricted faults. " This set of faults is truely 
unrestricted for it is precisely the set of all faults of the machine 
being diagnosed. 

Unrestricted faults are typically diagnosed using the technique 
of duplication. One of the aims of this chapter is to take a deeper 
look at duplication and ... a generalization of this scheme. 
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An alternative to using duplication for the diagnosis of 
unrestricted faults ia investigated in Chapter V. 

The main result in this chapter states that to achieve 1 -diagnosis 
of the unrestricted faults of a machine M, the detector must have as 
many states as M , the behavioral specification for M. Furthermore, 
to achieve 2 -diagnosis, the detector must have as many states as 
Mp, the reduction of M. These bounds on the state set size of the 
detector are independent of the delay allowed for the diagnosis. 
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4. 1 Unrestricted Faults 

As stated above, the set of unrestricted faults of a machine 
is simply the set of all faults of that machine. More formally, 

Definition 4. 1 : The set of unrestricted faults of machine M, denoted 
by Uj^, is the set Uj^ = {f |f is a fault of M}. That is, 

U M = s ' € «?(I»Z,R), r e T, and 9: Q Q’} . 

When it is clear what machine is under consideration, the 
identifying subscript will be dropped. 

One important property of the set of unrestricted faults is the 
relation between this fault set and the set of errors that may be 
caused by faults in this set. Given any r e R, x e I + andy e Z + with 
jx | = Jy j, there is a fault f € U such that ^.(x) = y. Therefore 
faults in U can cause any possible erroneous behavior, and for 
(M, U) to be (D, k)-diagnosable all of these possible erroneous 
behaviors will have to be detected by D. 

Due to the above observation it is clear that the output of M* 

(the system actually being observed by the detector) can give no 
information about what the correct output should be. Ther .'ore, 
for the diagnosis of unrestricted faults, the ability of D to observe 
M's input directly is crucial. This observation is made explicit 
in the following result. 
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Theorem 4. 1 : If (M,U) is (D, k) -1 -diagnosable , D is independent of 
M's input, and M is transition distinct then M is autonomous. 

Proof : Suppose that (M, U) is (D, k) -1 -diagnosable. D is independent 

of M's input, and M is transition distinct. Assume, to the contrary, 

that M is not autonomous. Then there exists r c R and x, y c I + 

such that |x| = jy| and o 3 $ r (x)) f a 3 $ r (y)). Let v € I* with |v j = k. 

For no false alarms to occur we must have $^(*3 (xv)) = 0 |XV ^ and 

I* V 

^r^r^ v ^ = ® Lei f « U be a fault for which £*(xv) = $*(yv). 

Since (r,x, $*(y)) is a 1-error it must be detected within k time 
steps of its occurrence. But j^($*(xv)) = $®($ r (yv)) = 0^ yv L 
Contradiction. Hence M must be autonomous. 
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4. 2 Diagnosis Via Independent Computation and Comparison 

It is a well-known and obvious fact that if a system is dupli- 
cated and both copies are run in parallel with the same inputs then by 
dynamically comparing the outputs of the two copies any error 
which does not appear simultaneously in both copies will be immed- 
iately detected. 

Our view of duplication is shown in Fig. 4. 1. In this figure 


I 1 



I I 


D 

$ 

Fig. 4. 1. Diagnosis via Duplication in the Detector 

the detector D consists of a copy of M along with a generalized 
Exclusive -OR gate where output is 0 if and only if its inputs are 
identical. Given such a detector D, it is immediately clear that 
(M, U) is (D, 0)-2-diagnosable. 

Duplication is an expensive technique, involving somewhat 
mere than twice the circuitry required for the unchecked system 
alone, but it has a number of positive attributes. In addition to 
being capable of diagnosing the unrestricted set of faults, 
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synthesis is easy and self -testing and self-diagnosable comparators 
are known to exist [ 1 ] . 

The basic configuration shown in Fig. 4. 1 can be generalized 
to the configuration shown in Fig. 4. 2. In this figure the detector 



D 


Fig. 4. 2. A Generalization of Duplication in the Detector 

consists of a machine M' which runs in parallel with M and a 
combinational comparator C which dynamically compares the out- 
puts of M and M'. Note that for the cascade connection M * D to be 
defined we must have I* = I and R* = R. 

With this scheme M' may be much less complex than M. How- 
ever, we will show that there is a relationship between the size of 
the state set of M* and the level of diagnosis which may be possible 
using M'. 
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In the following result we give a necessary and sufficient 
condition for (M, U). to be (D, 0)-diagnosable where D is structured 
as in Fig. 4. 2. The basic intuition for this result is that (M, U) 
is (D, 0)-l-diagnosable if and only if it is possible to perfectly pre- 
diet the behavior of M from that of M\ 


Theorem 4. 2: Let M realize M under (cr^Og,^). Let [M',Cj de- 
note a detector for M constructed from M' and C as shown in Fig. 

4. 2. There exists such that M' realizes M under 
if and only if there exists C such that (M,U) is ([M*, Cj , 0)-l- 
diagnosable. Similarly there exists such that M f realizes M 
under (e,e,ag) ^ and onl y there exists a C such that (M, U) is 
([M*, C] , 0)-2-diagnosable. 


Proof : (Necessity) Assume that M' realizes M under ( (J i> cr 2 >or 3 )* 


Then cr o 


r* 

3 


iZ\ ° o 1 = /3~ for all r e R. Since M realizes M 
Ogtrl 1 r 


under o 3 » /3 ff ^ °aj =|3~ for all re R. Hence 

2 


"3” 


/3' = a 

° 2 (r) 


l = a s 6 (p) ° Recall that and a ^ are 


assumed to be onto. Because of this assumption, it follows that 


a 3 ° ^r = °3 *^r * or r € ^et ^ the comparator shown in Fig. 4. 3. 
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v / 

C 

Fig. 4. 3. The Comparator Used in the Proof of Theorem 4. 2 

Since ° 0* = 0 , » / 3 the detector [M', Cj will give no false 
o r o r 

alarms. Let (r,x,y) be a minimal 1-error caused by f € U Then 
a 3 (/3 r (x)) / cr 3 (/3^(x)). Hence, a^(/3J») / a 3 0 3^(x)), and this will 
cause the Exclusive -OR gate to emit a 1. Therefore the minimal 
1-error (r,x,y) is detected with no delay. Hence (M.U) is 
([M*, C] , 0)-l-diagnosable. 

Similarly, if M* realizes M under (e,e,o 3 ) then 0 = cr 3 « ^ 

and a comparator as shown in Fig. 4. 3, but without the o 3 function. 

can be used to achieve ( [ M\ C] , 0) -2 -diagnosis of (M, U). 

(Sufficiency) Assume that (M,U) is ([ M*, Cj , 0)-l-diagnosable. To 

prove that there exists a <t 3 such that M r realizes M under ( <T p <T 2’' 7 3^ 

we must exhibit a function a’ and show that 5 /3 = <jL 0 £' . This 

•3 o r c r 

is sufficient because M realizes M under (a^, Og, a 3 ) 
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and a j and a ^ are assumed to be onto. 

Since no false alarms may occur we know that C (^(x), /3^(x)) = 0 

for all r € R and x € I + . Define ai as follows: o'Ai? (x)) = crA(3 (x)). 

or or 

Since has the desired property we must simply verify that it is 
indeed a function. 

It is clear that every z e {>T (x) |r e R, x e I + } has an image 

under To see that this image is unique suppose that ,3J,(x) = 

/3' (y). We must show that (x)) = (/3 (y)). Let *3* (x) = a, 

s o r o s r 

ag(/3 r (x)) = b, and cr^ (j3 g (y )) = c. Then C(b, a) = C(c, a) = 0. Assume 
to the contrary that b c. Let f e U be a fault which causes the 
output of M to be c at time |x | - 1 and which has no other affect. 

Let x = uv where v e I. Then (r,x, p r (u)c) is a minimal 1 -error 
and since C(c,a) = 0, it is not detected vhen it occurs. This contra- 
dicts the assumption that (M, U) is ([M f , C] , 0)-l-diagnosable. Hence 
Og is a function and M’ realizes M under {a 

The proof that (M, U) is ([JVT, C] , 0)-2-diagnosable implies that 
there exists a function such that M* realizes M under (e^Og) 
is essentially the same as the above proof. 

From Theorem 4. 2 we know that if M realizes M’ and M' is 
reduced and reachable then j Q ( > |Q' j. Hence Theorem 4. 2 tells 
us that if we use the scheme shown in Fig. 4. 2 for the diagnosis of 
unrestricted faults then we must have |Q* J > IQI in order to achieve 
1-diagnosis, and |Q'| > Iq r I in order to achieve 2 -diagnosis, 
where is the reduction of M. 
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4 . 3 Diagnosis with Zero Delay 

The question we will answer next is whether it is possible to 
achieve (D, 0)-l -diagnosis of (M, U) with a detector which is less 
complex, in terms of state set size, than the reduced and reachable 
specification M. One reason to believe that this may be possible 
is the observation that if M has an inverse then this inverse may 
have fewer states than M , and yet a detector constructed using this 
inverse may be capable of diagnosing all of U. Examples of such 
inverses are given in the following chapter. 


Theorem 4. 3: If (M, U) is (D, 0)-l-diagnosable then Jq_J > |Q j. 


Proof : Let (M, U) be (D, 0)-l-diagnosable, and assume, to the 

contrary, that |Q^| < |q|. Without loss of generality, assume 
that M is reachable. 

Claim: There exists q, q f e Q and s e such that (q, s), (q ? , s) 

e P*, the reachable part of M * D, and ° (3 A a„ ° /S' . 

o q o q 

Let g: Q -> ^(Q D ) - 4> (where 3?{Q D ) = {x|x c Q^) be 

defined by g(q) = {s | (q, s) e P*}. Assume that the claim is not 

true. Then or, 0 /3 £ cr, ° /3 , implies g(q) n ^(q) = <£. We know 
« Q o q 

from the proof of Theorem A. 2 that for each q e Q there is a state 
f(q) for which /3~ = a 3 0 ^f(q) 3 ° l anc * that * * s neceSs,ar Hy 1-1. 
Since M is reduced and reachable there must exist J Q ] = £ unique 
states {q^, . . . , q^} c Q such that i f j implies g(q^) n g(q^) = <}>, 
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and therefore |Q^| > |Q |. Contradiction. This establishes the 
claim. 

Let q,q' e Q and s e such that (q, s), (q\ s) e P* and 

o« 0 B 4 o« ° P ,. Then there exists a sequence ua where u € I* 
o q o q 

and a e I such that o„(0 (ua)) f a„03 , (ua)) and if u^ AthenaJ^ (u)) = 

o q o q c q 

OqO^ ,(u)). Since (q,s)eP*, there exists r e Randy e I* such that 
o q 

6*(p*(r)),y) = (q, s). 

Recall that given any r e R, xe I + andy € Z + with |x | = Jy|, there is a 

fault f e U suchthat/3^(x) = y. Let f e U be a fault for which^(yua) = 

$_(y)/3 ,(ua). Since it is known thato,($ (u)) = &«{$ ,(u)), it follows 
r q o q o q 

that (r, yua, (yua)) is a minimal 1-error. Now (M, U) is (D, 0)-l- 

diagnosable implies $^([$j.(yua), yua] ) / 0^ ua L Since no false 

alarms may occur, ^([^ r (y),y]) Also, since (q',s) e P*, 

^P([^ t( ua ), ua ]) = ol ua L Now 
s> q 

(yua), yua]) = ^ ([ /3 r (y)^,(ua),yua] ) 

= ^P([^ r ( y),y])^([£ q .(ua),ua]) 

= ol y loi ua l 
= 0 l^ ia l 

This contradicts the assumption that (M, U) is (D, 0)-l-diagnos- 
able. Therefore J Qj-j | > IQ|- 
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Corollary 4. 3. 1 : If (M, U) is (D, 0) -2 -diagnosable then |Q d | > |Q r |, 
where is the reduction of M. 

Proof : Assume that (M, U) is (D, 0) -2 -diagnosable, and consider 
M to be realizing M R . By Theorem 3. 2, (M, U) is (D, 0)-l -diagnos- 
able, and hence, by Theorem 4. 3. IQdI > IQrI. 

Let us now consider the set of faults of M which are caused by 
the output of M becoming stuck-at-v, where v € Z, at some time t. 
More formally, the set of permanent output faults o f M is the set 

F q = {f = (M\ r, e) |m’ = (I,Q, Z,fi,A\R,p) where 

X’(q, a) = X’(s, b) for all q, s e Q and a, b € I } 

Because the set of permanent faults causes the same minimal 
2-errors as the set of unrestricted faults if (M, F ) is (D, 0)-2-diag- 
nosable then (M,U) is (D, 0) -2 -diagnosable. However, U and F q do 
not cause the same minimal 1-error, and, in fact, (M, F ) is 
(D, 0)-l -diagnosable does not imply that (M, b) is (D, 0)-l -diagnos- 
able. These statements are proved in the following result. 

Theorem 4. 4: (M, F ) is (D, 0) -2 -diagnosable if and only if (M, U) 
is (D, 0) -2 -diagnosable. However, (M, F ) is (D, 0)-l -diagnosable 
does not imply that (M, U) is (D, 0)-l-diagnosable. 
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Proof : Let (M, F ) be (D, 0)-2-diagnosable. Let (r,ya, v/\ where a 
a € I, be a minimal 2 -error which is caused by f € U. To show that 
(M, U) is (D, 0)-2-diagnosable it suffices to show that /3^([ // (ya), yaj ) / 
0. Since (r,ya, w) is a minimal error, j3 r (y) = $*(y) and /3 r (ya) ^ 
//(ya). Say // (ya) = b, and consider the fault f r e F q which is caused 
by the output of M becoming stuck-at-b at time Jy (. Then /3^(ya) = 
/3 f (ya), and f' also causes the minimal 2-error (r,ya,w). Since 
(M, F ) is (D, 0)-2-diagnosable we know that /3* (ya),ya] ) f 0. 

Hence (ya),ya]) / 0 and (M,Ul is (D, 0)-2-diagnosable. 

Now assume that (M,U) is (D, 0)-:’iagnosable. Since F q CU, 
it follows immediately that (M, F q ) is (D, 0)-diagnosable. 

We prove that (M, F q ) is (D, 0) -1-diagnosable does not imply 
(M,U) is (D, 0)- 1-diagnosable by supplying a counter-example. Let 
Mj, Dj, and o^: Z — > Z be specified by the tables in Fig. 4.4. 
Then is reduced and reachable, and realizes under 
(e,e,a 3 ). 
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Since |Q d | < |Q jJ we know from Theorem 4. 3 l ' at (M^, U) is not 
(Dj, 0)-l-diagnosable. To see that (M^, F ) is (D, 0)-l-diagnosable 
takes a bit of analysis. Briefly, states A, D, and E duplicate states 
a, d and e and any error which occurs when is in one of these 
states is immediately detected. If is in b or c then will be 
in BC and if the output becomes stuck-at 2 or 3 at this time it will 
be immediately detected. If is in b or c and a stuck -at -0 or 
stuck-at -1 fault occurs then it will be tolerated for one time step 
and detected the next. This establishes the result. 

In the above counter-example it is clear that (M^, F ) is not 
(Dj, 0)-2-diagnosable because a stuck -at-1 fault which occurs when 
is in b causes a 2-error which is not immediately detected. 
Therefore this example also proves that, in general, (M, F) is 
(D, k)-l-diagnosable does r.ot imply that (M, F) is (D, k)-2-diagnos- 
able. Also, if (M, F ) was (D, 0)-2-diagnosable for some D then by 
Theorem 4.4 (M, U) would be (D, 0)-2-diagnosable and from Theorem 
4. 3 it would follow that |Qp! > |Q j. Hence this is also an example 
of how 1 -diagnosis may be achieved with a detector which is less 
complex than the least complex detector which is sufficient for 
2 -diagnosis. 
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4.4 Diagnosis with Nonzero Delay 

Suppose now that we allow some arbitrary, but fixed, k > 0 
in the detection process. Can this additional time be traded off for 
less detector complexity? Unfortunate iy, for the unrestricted case, 
the answer is no. In fact, if (M,U)is (D f , k)-l -diagnosable then we 
can construct a detector D, essentially by eliminating unnecessary 
states of D\ such that (M, U) is (D, 0)-l -diagnosable. 

Before stating this result formally, we will establish an import- 
ant lemma. 

Lemma 4. 5 : If (M, U) is (D\ k)-l -diagnosable then there exists a 
detector D such that |Qj^| < jQjy i> (M, U) is (D, k)-l-diagnosable, 
and for each q e Q^, Aj^(q, (z,u)) = 0 for some (z,a) t Z xl, 

Proof : Assume that (M, U) is (D', k)-l -diagnosable and construct 
D from D* as follows : 

1) Delete from the state table of D* any row corresponding to 
a state q for v/hich 

0 ^ l A D »(q> (z> a)) | (z, a) c Z x 1} . 

2) In the resulting table, replace every reference to the 
deleted state with a reference to an arbitrary remaining state, and set 
the corresponding output to 1. 

3) Repeat steps 1) and 2) until no further deletions are possible. 
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Since |Qjy | < 00 the above algorithm will terminate in a finite 
number of iterations. 

From the nature of the above construction it is clear that 
|Q d I < |Q d , I and for each q 6 Q D> X D (q ’ (z, a)) = 0 for some (z, a) 
c Z x I. It only remains to be shown that (M, U) is (D, k)-l-diagnosable. 

If the detector D* is in a state q for which 0 ^ {Ajy(q, (z, a)) j 
(z,a) £ Z x i}, then an error must have occurred because if D' is in q 

then an error detection signal will be emitted regardless of the input 
to D . Hence this error could be signaled whenever a transition to 
q is indicated, and there would be no loss in diagnosis and no possi- 
bility for a false alarm. Since all minimal errors which q signaled 
would then be signaled before D’ got to state q , q could be eliminated 
This is the essence of what is accomplished in steps 1) and 2). 

This elimination process is necessarily iterative because step 2) 
may introduce new states to be deleted. 

Since thi construction is diagnosis preserving, (M, U) is 

(D, k) -1 -diagnosable. 

Theorem 4. 6 : If (M,U) is (D 1 , k)-l -diagnosable then there exists 
a detector D with |Qjj| < |Qjy| suc ^ that (M,U) is (D, 0)-l -diagnos- 
able. 

Proof : Assume that (M, U) is (D 1 , k)-l -diagnosable. From Lemma 
4. 5 there exists a detector D such that |Q d ( < [Q d , |, (M,U ) is 
(D, k)-l -diagnosable, and for each q e Q^, Ap(q, (z,a)) = 0 for some 
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(z,a) e Z x i. 

Claim: (M, U) is (D, 0)-l-diagnosable. 

Assume, to the contrary, that (M, U) is not (D, 0)-l-diagnosable. 
Using induction on the delay of the diagnosis, we will deduce that 
(M, U) is not (D,m )-l-diagno sable for all m > 0. This will establish 
the result for it contradicts the hypothesis that (M, U) is (D, k)-l- 
diagnosable. 

Having assumed that the basis step for our induction is true, 
we assume that (M, U) is not (D, m)-l-diagnosable for some m> 0, and 
we must show that this implies (M, U) is not (D, m-^l)-l-diagnosable. 

Since (M, U) is not (D, m)-l-diagnosable, there exists a minimal 
1-error (r,x,y) caused by f e U and a sequence v e I + with Jv j = m 
such that 3jP ([ 3^ (xv), xvj ) =0 ^ xv I. Let 5 D (P D ( r )> [ 5* (xv), xv] ) = s. 
Let (z, a) € Z x I such that A D (s, (z, a)) = 0. By Lemma 4. 5 we know 

Af * 

that such a (z, a) exists. Let f’ be a fault for which ^ (xva) = 

r 

/3 (xv)z. Then (r,x, ,J (x)) is a minimal 1 -error but 
r r 

3^P([ (xva), xva]) = (J xva L Hence (M, U) is not (D, m+l)-l-diag- 
nosable. Therefore, (M, U) is not (D, 0)-l-diagnosable implies (M, U) 
is not (D, m)-l-diagnosable for all m > 0. 

But we know that (M, U) is (D, k)-l-diagnosable. Hence (M, U) 
is (D, 0)-l-diagnosable. This establishes the result. 
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Corollary 4. 6. 1 : If (M, U) is (D, k)-l-diagnosable then jQ^j > |Q|. 

Proof : This is an immediate consequence of Theorem 4. 6 and 
Theorem 4. 3. 

Corollary 4.6. 2 : If (M,U) is (D,k)-2-diagnosable f hen jQ^j > |Q r |, 
where M R is the reduction of M. 

Proof : Assume that (M,U) is (D, k)-2-diagnosable, and consider M 
to be realizing M R . From Theorem 3. 2, it follows that (M, U) is 
(D,k)-l-diagnosable. The result now follows immediately from 
Corollary 4. 6. 1. 

We know from Theorem 4. 4 and Corollary 4. 3. 1 that (M, F ) 
is (D, 0)-2-diagnosable implies {Q^J > jQpj. Can this result be 
generalized as was done for unrestricted faults by the previous 
corollary ? The following example shows that the answer is no. 
This example serves as a good example of when a space -time trade- 
off is possible. 

Example 4. 1 : Consider machines Mg and Dg of Fig. 4. 5. Since 
M 0 is reduced and reachable, !q„J = jQ 0 |, where M 9 is the 

& i Zp Zp 


reduction of Mg. 
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Fig. 4. 5. Machines Mg and Dg 

Note that no output symbol can appear next to itself in any output 
sequence produced by Mg. Since Dg will produce an error detection 
signal precisely when two consecutive inputs to it are identical, it 
can detect all permanent output faults of Mg with a delay of a most one. 
Therefore (Mg, F q ) is (Dg, l)-2-diagnosable, yet |Qg J > |Q d f. 




CHAPTER V 


Diagnosis Using Inverse Machines 

It is well known that many circuits can be diagnosed by what is 
commonly called a "loop check. " This involves regenerating the 
input to the circuit from the output and then comparing the regner- 
ated input with the actual input. Often the "inverse" circuit is easier 
to implement than the original circuit, thus providing a savings over 
duplication. For example, division can be checked using multiplica- 
tion. It is also possible to have greater confidence in a loop check 
than in duplication, especially if the checking circuit is less complex 
than the original circuit. 

In this chapter we will investigate the use of "inverse machines” 
for diagnosis using a loop check. Informally, machine M is an 
inverse of machine M if M can reconstruct the input to M from its 
output with at most a finite delay. 

Machines which have inverses can be characterized as being 
those machines which are "information lossless. " Information loss- 
less machines are machines whose behavior functions satisfy a 
condition which is similar to, but weaker than, the condition which 
a 1-1 function must satisfy. 

Information lossless machines and inverse machines were first 
introduced by Huffman [18], Huffman devised a test for information 
losslessness and for the existence of inverses. It should be pointed 
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out that our definitions of these notions are slightly less general 
than Huffman’s. The definitions in this paper are directed towards 
the use of inverse machines for diagnosis. 

Even [13] later devised a better means of determining information 
losslessness, and he presented two means for obtaining inverse 
machines. 

Information lossless machines and inverse machines are dis- 
cussed in textbooks by Kohavi [20 ] and Hennie [ 1'7J . Kohavi provideo 
a fuller description of Even’s techniques for obtaining inverse 
machines, and Huffman describes a different means of obtaining 
inverse machines. 

The questions about the use of inverse machines for diagnosis 
which we seek to answer in this chapter are: When can an inverse 
be used for the diagnosis of unrestricted faults? Given a machine 
M and an inverse M of M, what will be the delay in diagnosis if M 
is used to diagnose M using a loop check? How can an arbitrary 
machine be realized so that unrestricted fault diagnosis is possible 
using a loop check? 

We concentrate on unrestricted fault diagnosis in this chapter 
because this is the most natural and important fault class which can 
be diagnosed using a loop check. Inverse machines can be used for 
the diagnosis of more restricted sets of faults but synthesis and 
analysis for more general levels of diagnosis seems to be very 


difficult. 
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5. 1 Inverses of Machines 

Before we can formally define the inverse of a machine we need 
to introduce one preliminary notion. 

Definition 5. 1 : An (I, n)-delay machine ( delay machine ) is a machine 
M n = (I, I n , I, 6, X, R,p) such that if a. e I, 1 < i < n + 1, then 

G((a r ... »a n ),a n+1 ) = (a 2 , . . . , * n+ j) 

and 

A((a^, . . . i ■ a^ . 

An (I, n) -delay machine simply delays its input for n time steps. 
Stated more precisely, if M n is an (I, n) -delay machine then 

Pi x(a ....a ) = a 

(a^, . . . , a ) n+1 n+m m 

Definition 5. 2: Let M and M be two machines such that R = R and 
Z = ail (n-delayed) inverse of M if there exists an (I. n)- 

delay machine M n with i-e-set alphabet R such that for all r e R and 
x e I + 

Z r (8 r W> - . 

Note that if M is an inverse of M then IC ^ . However, it is 
not necessary to have I = Z. Symbols which are in Z but not in I 
can be useful for diagnosis. Since they will never appear while M 
is receiving its input from M, the appearance of one immediately 
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signifies that an error has occurred. 

M might more properly have been dubbed a "right inverse" of 
M for if M is an inverse of M it is not necessarily true that M is an 
inverse of M\ This is illustrated in Example 5. 1. This example 
is a counter-example to the claims of Kohavi [20 ] and Even [13 J 
that if M” is an inverse of M then M is an inverse of M . 

Example 5. 1: Consider machines Mj and of Fig. 5. 1. ]Vf ^ is 
a 0-delayed inverse of but is not 


Fig. 5. 1. Machines and 
In fact, there is no machine which is 

an inverse of M.. This is because the input symbols 0 and 2 are 
equivalent and so there is no way in which they can be distinguished 
once they have been applied. 

Definition 5.3 : A machine M is information lossless of delay n if 

for all r e R and a.a 0 . . . a , b.b,,. . . b € I + (a., b. e I, 1 < i < m) 
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V a i a 2‘ * ’ a m) = *V b l b 2* *‘ b m ) 

implies a. = b. for 1 < i < m -n. 

M is said to be lossless if it is information lossless of delay 
n for some nonnegative integer n. M is lossy if it is not lossless. 

Example 5. 2 : Machine of Fig. 5. 1 is information lossless of 
delay 0 and machine of Fig. 5. 1 is lossy. 


R 



Fig. 5. 2. Machine M in Series with an Inverse M of M 

Referring to Fig. 5. 2, if M is lossless and M is an inverse of 
M then intuitively no information is lost as sequences from I + are 
transformed into sequences from Z + by M. The same is true for 
the entire process which consists of transforming sequences from I + 
into sequences from Z + and then back again. Therefore it is somewhat 
surprising to see, as we have in Example 5. 2, that M may be lossy. 
This may occur because while M must lose no information in trans- 
forming the sequences it observes at the output of M, M may not be 
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capable of producing all possible output sequences. Thus while IvT 
must be lossless with respect to a subset of Z + it may be lossy with 
respect to all of Z + . 

Even [13] gives an algorithm for determining if a given machine 
is lossless, and if so, of what delay. It is particularly easy to 
determine whether a given machine is lossless of delay 0. This is 
because a machine M is lossless of delay 0 if and only if the output 
symbols in every row which corresponds to a state q e P are all 
distinct. 

Machines for which inverse machines exist can be characterized 
as being precisely those machines which are lossless. More pre- 
cisely, 

T heorem 5, 1 : M has n n -delayed inverse if and only if M is 

information lossless of delay n. 

Proof : (Necessity) Assume that M is a n-delayed inverse of M. 

Let r e R and a.. . . a , b,. . . b e I + (a., b. el, 1 < i < m) such 
1 m l m l i — — 

that j3 (a.. . . a ) = 0 (b,. . . b ). We must show that a. = o for 
rimrim ii 

all i, 1< i < m-n. 

Since M is a n-delayed inverse of M there exists an (I, n)-delay 
machine M n such that, /J, ° $ = //\ In particular, (a^. • • a^)) = 

^(aj. . . a^) = a^ and (^ r ( b , b^)) = /^.(b j. * * = b^ 

for all £, n < £ < m. 
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Now $ r (a r . . a m ) = $ r (b r . . b m > Implies fej. . . a £ )) = 

/5($ (b,. . . b tf )) for all £, 1 < £ < m. Therefore a* = b. „ for 
all £, n < £ < m. That is. a^ = for all i, 1 < i < m-n. Hence, 

M is lossless of delay n. 

(Sufficiency) Given a machine M which is lossless of delay n, we 
can show that M has a n -delayed inverse by constructing one. Tech- 
niques for constructing inverses of lossless sequential machines can 
be found in Hennie [17] and Kohavi [20]. With minor modifications 
to insure the existance of suitable starting states, these techniques 
can be used to construct inverses of resettable machines. 
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5, 2 Diagnosis Using Lossless Inverses 

If M is an n-delayed inverse of M then, by definition, there 
exists an (I, n) -delay machine M n such that \ 0 $ r = Diagnosis 
using inverses can be performed by implementing M, and M n and 
dynamically checking to see if the above relationship holds. The 
basic configuration for diagnosis using inverses is shewn in Fig. 5. 3. 



D 


Fig. 5. 3. On-line Diagnosis Using Inverse Machines 

Since an (I, 0)-delay machine is simply a combinational machine 
which realizes the identity function on I, a detector which uses a 
0-delayed inverse will have the form shown in Fig. 5. 4. 


R I 1 



D 


Fig. 5. 4. A Detector which Uses a 0-delayed Inverse 
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We now state the basic result relating the use of lossless 
inverses with the diagnosis of unrestricted faults. 

Theorem 5. 2 : Let M be a lossless machine and let M be an n -delayed 
inverse of M. Let D be constructed from M, the (I, n) -delay machine 
which demonstrates that M is an n -delayed inverse of M, and an 
Exclusive -OR gate as shown in Fig. 5.3. IfE is lossless of delay 
d then (M,U) is (D, d)-2-diagnosable. 

Proof : Since = ^(x), there will be no false alarms. 

Let (r,x, w) be a minimal 2 -error caused by a fault f e U. 

Then // (x) ^ 0 r (x). Let y e I* with |y j = d. Since M is lossless 
of delay d, F r O^(xy)) / i3($ r (xy)). The Exclusive-OR gate will 
detect this inequality, and hence the minimal 2 -error will be detected 
within d time steps of its occurrence. Therefore (M, U) is (D, d)-2- 
diagnosable. 

It is worth noting that the delay in diagnosis is not the delay of 
losslessness of M but rather of its inverse M. Thus an n-delayed 
inverse can be used to achieve diagnosis without delay if it is loss- 
less of delay 0. 

Example 5. 6, which appears later in this chapter, shows that 
the converse of Theorem 5. 2 does not hold. Namely, it is possible 
to diagnose the unrestricted fault set of a machine using an inverse 
which is not lossless. However, not all inverses can be used for 
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the diagnosis of unrestricted faults. Example 5. 5 shows how a lossy 
inverse can be useless for diagnosis. The complete characteriza- 
tion of inverses which can be used for unrestricted fault diagnosis 
is still an open problem. 

Given Theorem 5. 2 and the observation that an inverse machine 
may be lossy, an important question is whether every lossless 
machine has a lossless inverse. This question is presently unan- 
swered. However, it can be shown that if M is lossless of delay 
0 then there exists a lossless inverse of M. 


Example 5. 3 : Consider machines Mg and M^ of Fig. 5. 5. Mg 
lossless of delay 2 and Mg is a 2 -delayed inverse. Since Mg is 
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Fig. 5. 5. Machines Mg and Mg 


is lossless of delay 0 it can be used to form a detector Dg such that 
(Mg,U) is (Dg, 0)-2-diagnosable. 
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The following example shows that it is possible to diagnose the 
unrestricted fault set of a machine using a lossless inverse which 
has fewer states than the reduction of the machine being diagnosed. 


Example 5.4 : Consider machines Mg and Mg of Fig. 5.6. Mg is 
a 2 -delayed inverse of Mg, and 


is itself lossless of delay 2. 



Fig. 5. 6. Machines Mg and Mg 


Therefore a detect or Dg can be constructed from Mg and the 

(I, 2) -delay machine Mg of Fig. 5. 7 such that (Mg, U) will be (Dg, 2)- 

2-diagnosable. Notice that Mg is reduced and reachable and that 

iQgl > |Qg|. However, because Mg is also in the detector |Q d | = 

2 ^ 
|Qg| |Qg | =16. Therefore jQg | < |Qp |. This is in keeping with 

3 

what we know from Corollary 4. 6. 2. 
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Fig. 5. 7. Machine 


From Corollary 4. 6. 2 we know that if (M, U) is (D, k)-2-diagnos- 
able then IQ^I > j Qpj 1 » where is the reduction of M. Using this 
corollary and Theorem 5. 2 we can derive a lower bound on the state 
set size of a lossless inverse M of M. This bound is stated in terms 
of the input alphabet size of M, the delay of lossiessness of M, and 
the state set size of 


Theorem 5. 3 : Let M be lossless of delay n, let M„ be the reduction 

— — — — • X\ 

of M, and let M be a lossless n -delayed inverse of M. Then 


151 > 


|i| n 


Proof : Consider M to be realizing its reduction and consider M and M 

in the configuration used for diagnosis shown in Fig. 5.3. Since ftiis 

lossless, by Theorem 5.2 (M, U) is (D, d)-2-diagnosable where d is 
the delay of losslessness of M. Now by Corollary 4. 6. 1 |Q^| > 
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|Q r |. Since Q d x f, |Q d | = |5| |l|". Thus |Q| |l|" > |Q r |, 


or 



If one has a lossless machine M of unknown delay and an inverse 
M of M then a lower bound on the delay n of M can be found using the 
following inequality : 

log j Q r I - log |q| 

n > 

log |l| 

This inequality was obtained directly from the one in Theorem 5. 3. 

Given a machine M = (I, Q, Z, 6, A, R,p) let Z* denote the subset 
of Z which may actually appear in an output sequence of M. That is, 
let Z* = {^ r (x)|r e R, x e I + }. 

The following result gives a very simple necessary condition 
which all lossless machines must satisfy. 

Theorem 5.4: If M is lossless then |l| < |z' j. 

Proof : Assume that M is lossless of order n. Let f r : I + -> Z + xQ 
be defined by f r (x) = (^ r (x),6(p(r), x)). 

Claim: f r is 1-1. Let x, ye I + where x^y. If |x| ^ |y|then |$ r (x}|^ |4^y)| 
andhence f r (x)^f r (y). Let JxJ = jy J andassume, to the contrary, thatf r (x) 
f (y). Then/^(x) =j^(y) and<5(p(rix) = 6(p(r),y). This implies that j3(xz) = 
fljyz) for all z e I*, and for some z of length n. Since M is lossless of delay n 
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this implies that x=y. Contradiction. Hence if |x [ = Jy| and x ^ y 
then f f (x) / f r (y). Since either JxJ = |y| or |x| ^ Jy J , the claim 
is established. 

Since f f : I + -> Z + x Q is 1-1 and Jx J = |^ f (x)| it follows that 
|l| m < |z’| m |Q| for all m > 0. Hence |l| m /|z f | m |q| < 1 for 
all m > 0. Since |Q| is a fixed positive integer, this implies that 
|l|/|Z’| < 1, or Il| < |z’|. 

This result has some immediate corollaries concerning inverses 
of lossless machines. 

Corollary 5. 4. 1 : Let M be a lossless machine with j 1 1 < |z* J. 

Then any inverse M of M with'Z' = I is lossy. 

Proof : Let M be an inverse of M with Z* = I. Since M is an inverse 

of M, Z' c F, and we know that |l) < jz’|. Hence |z r | = Jlj < 

| Z* | < |r|. By Theorem 5.4, M must be lossy. 

This corollary says that if M is lossless and 1 1 J < | Z* | then 
for an inverse M of M to be lossless M must have output symbols 
which would never appear while M is receiving its input from M. 
However, if a fault occurs to M and causes an error then M could 
emit one of these symbols. The appearance of one of these symbols 
in IJs output would immediately cause an error detection signal 
because this same symbol cannot appear in the output of an (I, n)- 
delay machine. 
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Corollary 5. 4. 2 : Let M be a lossless machine with a lossless 
inverse M. If Z f = I then |l| = |z'|. 

Proof : This follows immediately from Corollary 5.4. 1. 

Given the above result, an immediate question is whether M is 
lossless and Jl( = Jz' J implies that any inverse Hi of M is lossless. 
As Example 5. 5 shows, the answer is no. 

Example 5. 5 : Consider machine Mg of Fig. 5.8. Mg is an inverse 
of machine Mg of Fig. 5. 6 and Ig = Z^, but Mg is not lossless. 
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Fig. 5. 8. Machine Mg 
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5. 3 Applicability of Inverses for Unrestricted Fault Diagnosis 
The use of inverses as a technique for performing diagnosis 
applies directly only to those machines which have suitable inverses. 
In the following development we will show that given an arbitrary 
machine M', we can always construct a realization M of M' such that 
M has an inverse which can be used for diagnosis. The realizations 
will be obtained simply by augmenting the output of the original 
machine. Thus we will show that diagnosis using inverses is a 
universally applicable technique. 

Definition 5. 4 : M is an output -augmented realization of M’ if M = 
(I r ,Q\ Z'xA,6\ A, R 'jp’) and A = A' x A^ for some A^: Q’ x I’ A. 

If M is an output -augmented realization of M f then M realizes 
M' under (e.e.P^t) where P^, is the projection of Z' x A onto Z\ 
Kohavi and Lavalleo [19] have given a construction which 
proves the following results. 

Theorem 5. 5: Given any machine M\ there exists an output - 
augmented realization M of M' which is lossless of delay n for 
some n, and in particular, for n = 0. 

Theorem 5. 6: If M* is lossless of delay n, then for every m, 

0 < m < n, there exists an output -augmented realization M of M' 
which is losbless of delay m. 
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The method that Kohavi and Laval lee use to achieve the above 
results employs a ’’testing graph” which is used to determine if the 
given machine M’ is lossless, and if so of what delay. Output aug- 
mentation which will yield the desired property is determined by a 
method of cutting branches in this graph. Minimal augmentation 
for losslessness of a desired delay is not guaranteed. 

A lower bound on the amount of output -augmentation necessary 
to make a particular machine lossless is given by Theorem 5.4. 
This result tells us that for the output -augmented realization to be 
lossless, then the size of its output alphabet must be at least as 
great as the size of its input alphabet. 

Any machine can be made lossless of delay 0 simply by aug- 
menting its output with a copy of the input. This gives an upper 
bound on the amount of output augmentation which is necessary to 
make a given machine lossless of delay 0. 

It is tempting to use the Kohavi and Lavallee technique to aug- 
ment the inverse of a machine in the hope of achieving a lossless 
inverse. However, this is impossible because an output -augmented 
realization of an inverse M of M is not necessarily an inverse of M. 

Example 5. 6 : Consider the configuration shown in Fig. 5.9. Here 
M' is any machine, and M is the output -augmented realization of M 
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which was formed simply by augmenting the output of M' with a 
copy of its input. The inverse XT of M shown in this figure is 



Fig. 5. 9. A Lossless Machine with a Lossy Inverse 

simply the combinational machine which realizes the projection of 
Z x I onto I. This inverse is lossy and is clearly useless for 
diagnosis. 

Now augment the output of M ’ to form the machine M shown 
in Fig. 5. 10. This machine is lossless but it is not an inverse of 

Z I 1 2 r 

I I 

I I 1 l 

1 I 

M 

Fig. 5. 10. An Output -augmented Realization of M* of Fig. 5. 9 
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M and it too is useless for diagnosis. 

Although Kohavi and Lavallee's technique cannot be used to 
construct lossless inverses, it is an important technique because 
it can be used to construct lossless of delay 0 realizations of any 
given machine. The following result shows that given a machine 
which is lossless of delay 0, an inverse of that machine can be 
constructed which can be used for the diagnosis of unrestricted 
faults. 

Theorem 5. 7 : Let M be lossless of delay 0. Then there exists 

an inverse M of M such that (M,U) is (D, 0)-2-diagnosable where 
D is formed from M and an Exclusive -OR gate as shown in Fig. 5.4. 

Proof : Let M = (Z, P, IU {e} , 6, X,R,p) where e 1 1 and for all 

q € P and a e Z 


5“(q, a) 
X(q, a) = 


r 6(q, b) if b e I and A(q, b) = a 
^ arbitrary if a £ A(q, I) 

{ b if b e I and A(q,b) = a 
e if a i x(q,I) 


Thus 53 is basically the same as M but with the roles of the 


input and output interchanged. 



The functions 6 and A are well-defined for if M is lossless 
of delay 0 and q € P then X(q, a) = X(q, b) implies a = b. 

H |I| < | Z j then every symbol in Z cannot appear in every 
row of the state table of M. This is what gives rise to the transi- 
tions of M which may be arbitrarily specified. 

Consider M and M to be operating in series as shown in Fig. 

5. 2. Since M and M* have the same reset function, they will initially 
be in the same state. Now if M and*M are both in some state q e P 
and the input symbol b e I is applied to M then M will emit A(q,b) 
and go to state 6(q, b). M will emit A(q, A(q,b)) = b and will go to 
state 6(q, X(q,b)) = 6(q,b). Thus M and M* will make the same 
state transitions and the present output of M will always be the 
present input to M. Hence M is a 0-delayed inverse of M. 

It remains to be shown that (M, U) is (D, 0)-2-diagnosabla This 
must be shown directly because M is not necessarily lossless. 

Since M is a 0-delayed inverse of M there will be no false alarms. 
Let (r,xa, wb) where a € I and be Z be a minimal 2-error. Since 
any input sequence applied to M will cause M and M to experience 
the same state trajectories, 6(p(r),x) =6* (p(r),w). Say 6(p(r),x) = 
q. Since (r,xa, wb) is a minimal 2-error, 0 r (xa) / b. Now 
A(q,/3 r (xa)) = a and therefore A(q,b) ^ a. This inequality will be 
detected by the Exclusive -OR gate which will emit a fault detection 
signal. Hence (M, U) is (D, 0)-2-diagnosable. 
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It should be noted that the inverse constructed in the proof of 
the above theorem is not necessarily lossless. By using Jzj - Jlj 
new symbols, instead of just one, M could have been constructed to 
be lossless of delay 0. 

Example 5.7 : Consider machine M'^ of Fig. 5. 11. This machine 
is an inverse of machine of Fig. 5. 1. It was constructed as 
described in the proof of Theorem 5. 7. The transitions of which 
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Fig. 5. 11. Machine 


may be arbitrarily chosen are indicated by a This inverse of 

Mi is not lossless, but it can be used for the diagnosis of unrestricted 
faults of M^. 

A lossless inverse M” of can be obtained from simply 
by changing one of the M e" outputs in each row of the state table of 
M!j to e\ so constructed would be lossless of delay 0 because 
the output symbols would be distinct in every row of the state table 
of M'|. 
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CHAPTER VI 

Diagnosis of Networks of Resettable Systems 

In this chapter we will consider the problem of diagnosing a 
machine which has been structurally decomposed and is represented 
as a network of resettable state machines. The networks that we 
will be using are very general and they will allow us to work within 
a wide range of structural detail. 

The fault set which we will be applying to these networks is the 
set of ’’unrestricted component faults. " Informally, an unrestricted 
component fault is a fault which only affects one component machine 
but which may affect that component in an unrestricted manner. 

This fault set is a natural restriction of the set of unrestricted 
faults. We will show that it is possible to diagnose the set of unres- 
tricted component faults of a network with relatively little redund- 
ancy. 

This chapter focuses on the diagnosis of "state networks. ” 

A state network is simply a network in which the external output is 
the state of the network, i. e. , a vector consisting of the state of each 
component machine in the network. Since the state of a state network 
is directly observable at its output, state networks are easier to 
diagnose than arbitrary networks. 




no 


The results in this chapter characterize state networks which are 
diagnosable using combinational detectors. A general construction 
is given which can be used to augment a given state network such 
that the resulting state network is diagnosable in the above sense. 
Upper and lower bounds on the amount of redundancy required by 
such an augr „ntation are derived. 



Ill 


6. 1 Networks of Resettable Systems 

The field of study known as "algebraic structure theory of 
sequential machines" is concerned with the synthesis and decompo- 
sition of sequential machines into networks of smaller component 
machines. The networks considered in this chapter are very 
similar to the "abstract networks" introduced by Hartmanis and 
Stearns [16]. The major differences are in our use of resettable 
state systems for the components and in our system connection rules 
which force all computation to be done in the component systems or 
in the external output function. Hartmanis and Stearns use sequential 
state machines for their components and they allow for a combina- 
tional function f.: (x Q.) x I — > I. to proceed each component. 

Definition 6. 1 : A network of resettable systems is a 6 -tuple 
N = (I, R, (S r . . . , S n ), (K r . . . , K n ), Z, X) where 

I is a finite nonempty set, the external input alphabet 
R .is a finite nonempty set, the external reset alphabet 
S. = (I.,Q^, 6., R,p.) for each i, 1 < i < n, is a resettable 
state system, a component system 
for each i, 1 < i <n, is a subset of {Q^, . . . ,0 , i}, 
a system connection rule 

Z is a finite nonempty set, the external output alphabet 

X: ( x Q.) x I x T — ■> Z, the external output function 
i=l 1 

such that for each i, 1 < i < n, if 
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K. * {A.,. .. , A*} then I. = x A., 
i i £ 1 j=l 3 

Under the intended interpretation, the system connection rule 
specifies from which parts of the network component i receives 
its input. 

By the convention we introduced in Section 2. 1, if = <t> then 
I. is any singleton set. Therefore if M i has no connections then it 
is an autonomous machine. 

Example 6. 1 : The 6 -tuple described in Fig. 6. 1 specifies network 
Nj. This network has two component machines and Mg with 
state sets {pj,Pg} and {q^qg} respectively. is connected to 
the external input and the output (state) of Mg and Mg is connected 
to the external input and the output (state) of M^. Network Nj can 
be viewed pictorally as shown in Fig. 6. 2. 
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Fig. 6. 1. Network N 
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II R 



Fig. 6. 2. Diagram of Network 

Since any machine may be viewed as a one component network 
we see that a network may convey little or no structural information. 
On the other hand the structural description given by the network 
may be very detailed. For example, each component may be a two- 
state state machine which represents only one flip-flop and one 
coordinate of the global transition function. 

Definition 6. 2 : A network N = (I, R, (Sp . . . , S n ), (Kp . . . , K^), Z, A) 

defines the system = (I,Q, Z, 6, A,R,p) where 

n 

Q = x Q. 
i=l 1 




115 


6(q,a,t) = 5((q r .. .,q n ),a,t) 


n 


_ ^ * * * • Qjj> a)» t] 


n 

p(r,0 = x p.( r,t) 

i=l 1 


A network of resettable machines is a network in which the 
component systems and the external output function are all time- 
invariant. For example, network of Fig. 6. 1 is a network of 
machines. The system defined by a network of machines N is also 
time -invariant, and it will be denoted by M^. A network of machines 
N realizes a machine M if M XT realizes M. Likewise the defini- 
tions of reduced machines, reachable machines, and so forth can 
be extended to apply to networks of machines. 


Example 6. 2 : Consider network of Fig. 6. 1. This network 

defines machine of Fig. 6. 3 and it realizes of Fig. 6. 4 


because M.. realizes M.. 

N i 1 
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Fig. 6.4. Machine Mj 


A network N = (t, R, (Sj S n ), (Kj K^), A, Z) is a state 

n . . n 

network if Z = x Q, and Mq,a) =q for all q ex Q. and 
i=l 1 i=l 1 

a e I. If N is a state network then is a state sys tem. For state 

networks it is unnecessary to explicitly specify the external output 

alphabet and the external output function. 

Since the fault set which we will be considering does not allow 

for faults which affect the external output function, we will focus on 
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the diagnosis of state networks which realize state machines. The 
diagnosis of the output function will be taken care of separately, 
possibly by duplication. 

Performing diagnosis on state networks is easier, in general, 
than for arbitrary networks because with state networks the output 
function does not mask the internal operation of the network. 

Decomposing a network into a state network and an output function 
and then diagnosing each separately has the effect of applying a 
tighter tolerance relation to the diagnosis of the original network. 

This is also due to the lack of any masking of the state by the out- 
put function. 



118 


6. 2 Unrestricted Component Faults 

Suppose that N and N* are networks. Then f = (N f , r, 9) is a 
fault of N if f* = (S^,, r, 0) is a fault of S^. Thus a fault of N can be 
considered to be a transformation of N into another network N* at 
some time t. The notions of fault tolerance, error, and diagnosis 
are extended in a similar manner to apply to networks. 

Given a network N, a natural set of faults to consider are those 
which are caused by failures in one component of N. If f = (N', r, 9) 
is caused by failures which are restricted to one component of N then 
N* will differ from N only in that one component. Likewise 9: xQ. 
->xQ| will act as the identity on each coordinate except possibly 
the one affect by f. These faults are described formally in the 
following definition. 


Definition 6. 3: Let N = (I, R, (M, , . . . , M ), (K;, . . . , K_), Z, A) be 

i n i n 

a network of machines. A fault f = (N\ r, 0) of N is an unrestricted 
component fault if for some j , 1 < j < n 

i) N’ = (I, R, (M r . . . , S!, . . . , M n ), (K r . . . , K n ), Z, A) where 
Sj € tS*(I.,Q.,R) and 

n 

ii) for ail (q r . . . , q n ) c x^ 0(q r . . . , q R ) = (q’ r . . . , q^) 
implies q^ = q! for all i f j. 


The set of all unrestricted component faults of a network will 


be denoted by U^. 
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Note that since N' is a network, Sj is required to be a state 
system. Because the output alphabets of and Sj are identical 
and they are both state systems their state sets must also be identi- 
cal. Thus, unrestricted component faults do not permit state blowup 
or collapse. 

The fault set U^, is sufficiently restricted to make possible its 
diagnosis with relatively little redundancy. On the other hand, 
is not unduly restricted for it allows for any number and type of 
physical failures to occur to any one component; subject, of course, 
to the general restrictions on faults outlined in Section 2. 3. Thus 
using as the fault class greatly reduces the amount of failure 
analysis which is necessary within the components. 
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6. 3 Characterization of Combtnationally Diagnosable Networks 
How can state networks for which a combinational detector can 
diagnose the set of unrestricted component faults be characterized? 
We shall show that one means of doing this is in terms of the amount 
of network redundancy. 

Given a network of machines N we will assume, as we have 
earlier, that N realizes some reduced and reachable machine M. 
Since the relation between the state set of N and the state set of M 
will be of interest to us we will use the structurally oriented char- 
acterization of a realization given by Theorem A. 1, and will assume 
that N realizes M under (r?j, n p, n 4 ). We will assume as before 
that and rjp are onto. The natural extensions of and Og to 
sequence to sequence mappings will also be denoted by and Oj. 

The reachable part of N will be denoted by P. 

Since M is reachable the domain of n 4 is Q. Since M is reduced 
and and Op are onto it can be shown that: i) q, q' € Q and q f q* 

implies n 4 (q) n n 4 (q’) = <t> and ii) U, n 4 (q) = P. Let v’: P— >Q 

q«Q 

where r? 4 (q) = q f if and only if q e n 4 (q’). Because n 4 induces a 
partition of P, n 4 is a well-defined function. Thru an abuse in 
notation, r? 4 will be referred to more suggestively as ri~ This 
function will play an important role in the following results. 

If N is a state network which realizes a state machine M under 
(bj, Opt Oj, n 4 ) then by Theorem A. 1 r)g( q) = (q) for all q € P. 
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In this case we will take to be identical to . 

Notation : Given a network N let C c {l, . . . , n} denote a subset of 

the set of components. Let CL denote the particular subset {l, , 

i-1, i+1, . . . , n}. Let q = (q^ f .... q n ) and s = (s^, . . . , s n ) be states 
of N. 

Each C induces a partition on Q = x where q s if 

and only if q. = s. for all i e C. 

A cover of a set L is a set of subsets of L whose union is L. 
Thus every partition of L is also a cover of L. A cover J of L is 
a singleton cover if B € L implies J B J < 1. If J is a cover let 
#|j j denote the cardinality of the largest element in J. 

Let Cc{l,..,,n} and let n ^ = {B^, . . . , B^}. C induces the 
cover 

C = {n4 1 (B 1 np) ^ n p)} of Q 

where if B c p then n^(B) = {^^(q) Jq e B}. In particular, 

= <t>. 

Each set of states which the components in C can take on 
corresponds directly to a block of the partition 77 Thus 
represents the information about the current state of N which is 
given by the current states of components in C. C represents the 
corresponding information as to the state of M which N is currently 
mimicing. If C is a singleton cove T then the current state of each 



122 


component in C completely determines the corresponding state of M. 
Note that {l, . . . , n} is always a singleton cover. 

Definition 6. 4 : Component hi of a network N is redundant if C. 

is a singleton cover. N is totally redundant if every component of 
N is redundant. 

If N is totally redundant then knowledge of the state of any n-1 
components is sufficient to determine the corresponding state of M 
although it may not be sufficient to determine the state of the remain- 
ing component. 

Example 6. 3 : Consider network of Example 6. 1. realizes 

machine of Fig. 6.4 under (e,e,e,r7^) where n 4 is defined by 
the table: 


q 

n 4 (q) 

a 

(p 1 »q 1 ) 

b 

(Pl’l2 ) 

c 

(p 2 , q 2 ) 

d 

<P 2 ' q l ) 


Now 7T C ^ = 7T| 2 | = { vppQj)* (p 2 » q l^ ; ^ 1 * ^2^ (p 2’ q 2^ and S ° 
Cj = { (p^> Q^)> ^2* » ^4 {(p ^2^’ ^2* ^2^^ 


= {{a,d},{b,c}} . 
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Therefore is not a singleton cover, is not a redundant 
component, and is not totally redundant. 

If it is a partition of L. let f ff : L -> it denote the natural mapping 

induced by it. Let Q ->C be defined as l ^.(q) = (f ff (q) 

C 

n P ). The interpretation of 5 is as follows : given the state of 
each component in C, take any q e Q which agrees with this informa- 
tion, and i ^.(q) is the set of states of M to which the current state of 
N may correspond. 

Lemma 6. 1 : Let N be a totally redundant state network of machines, 

and let q = (q y . . . , q. , . . . , q n ) and q' = (q^, . . . , q!, . . . , q n > be states 
of N. If q, q* € P then rj' 1 (q) = n^Cq'). 

Proof : Let q, q' e P and let C = {l, . . . ,n}. Then 

5 c (q) = qj 1 #, (q) " P) 

c 

£ n^ 1 # (q) n P) 

S 

c ? c (q } 

Since N is totally redundant, is a singleton cover. There- 
fore H c (q)| <1, and hence l c (q) = <f> or t c (q) = t c (q). Now 
^ -1 

q e P implies l c (q) = r? 4 (q) t <t> and thus l c (q) = £ c (q). Like- 
wise, ^ c (q f ) =? c (q f ). Now 
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C r «q> * (fljnp) 

W ’'Cj 


* *lX r ‘o') n p » 

c i 


= ? c (< i f) 


Therefore C c (q) = ? c (q '), and hence n^(q) = n^Cq'). 


Suppose that an unrestricted component fault f occurs to a 
totally redundant network of machines N and causes a minimal 
2-error (r,x,y). Say that /^(x) =q = (q^,. . . , q n ). Du® to the 
nature of f, namely that it affects only one component. 4« = 
q' = (q ...» q!, ... , q n ). K q* € P then Lemma 6. 1 tells us that 
this 2-error is not a 1-error because n^(q) = n^^(q'). 


Theorem 6. 2 : Let N be a state network which realizes a state 

machine M under (n^, Hg* Hg, n 4 ) where Hg = n 4 *. Then (N, U c ) 
is (D, 0)-l-diagnosable for some combinational detector D if and 
only if N is totally redundant. 

Proof : (Necessity) Suppose that (N, U^) is (D, 0)-l-diagnosable 

where D is combinational, and let D realize the function A^. Assume , 
to the contrary, that N is not totally redundant. Then for some i, 

Cj is not a singleton cover. Hence there exists q = (q^, . . . ,q., ...,q^ 
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and q = (q^, ...» q! q R ) such that q, q’ e P and n^q) t n^(q'). 

Since q, q* e P, * D (q) = X^q') = ® *°r otherwise a false alarm could 
occur. Let f e U^, be a fault caused by the output of M. becoming 
stuck-at-q! at a time when M could be in q. This fault can cause 
a 1 -error which is not (D, 0)-l-diagnosable. Contradiction. There- 
fore if (N, U^,) is (D, 0)-l-diagnosable where D is combinational then 
N must be totally redundant. 

(Sufficiency) Assume that N is totally redundant. Let D be the 
detector which realizes the function Q -> {0, 1} where 


V q) 


0 if q € p 

1 if qi P 


Clearly, D will give no false alarms. 

Let (r,x,y) be a minimal 1-error caused by f t U^,. Let x = uab 
where a, b € I. 

Then (ua)) =rj 4 1 (/3^(ua))and uab)) ^^(//(uab)). Say 

//(ua) = q. Then //(uab) = 6 f (q,a, t) where t = |u|. Because f e U r , 
f can affect at most one component of N. Therefore 6(q,a) will 
differ in at most one coordinate from 5*(q, a,t). Let 6(q, a) = s = 

(sj, . . . , s. , . . . , s R ) and let (q, a, t) = s’ = (s^ . . . , sj, . . . , s n ). 

Since J? 4 *(q) = ^(/^(ua)), by Theorem A. 1, n 4 1 (6(q,a)) = r? 4 1 O r (uab)). 
Therefore s e P, and rj~*(s) 4 n 4 *(s’) because ^(/^(uab^n^j/faab)). 
Applying Lemma 6. 1 we deduce that s’ 4 P. Therefore X D (s') = 1, 
the 1 -error (r,x,y) is detected without delay, and (N, U^,) is 
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(D, 0)-l-diagnosable. 

Given C c {l, . . . ,n} , let - {b^, . . . , B^}. Then C induces 
a partition F^ on P where F^, = {b^ n p, . . . , n p} - <£. 

If a partition v of a set L is a singleton cover then we will denote 
this by writing n = 0. This notation is derived from the observation 
that this partition is the least element of the lattice of all partitions 
of L. 

Corollary 6. 2. 1 : Let N be a state network of machines. Then 

(N, U^,) is (D, 0)-2-diagnosable for some combinational detector D 
if and only if F =0 for all i, 1 < i < n. 

l ~ 

Proof : Consider N to be realizing the reduction of Then 

Hg is 1-1. By Theorems 3. 2 and 3. 3 (N, U^,) is (D, 0)-2-diagnos- 

able for some combinational D if and only if (N, U^,) is (D, 0)-l- 

diagnosable for some combinational D. 

-1 

Now since is 1-1, so is . Therefore C. is a singleton 

cover if and only if F^, = 0. Hence N is totally redundant if and 

i 

only if F^ =0 for all i, 1 < i < n. 

l 

The result now follows immediately from Theorem 6. 2. 

Example 6. 4 : Again consider network of Example 6. 1. Let 
be the associated state network which is obtained from by 
changing the external output function and alphabet. Let be the 
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state machine corresponding to machine of Fig. 6. 4. Then 
NJ, realizes and is the same in this case as in Example 6. 2. 
Hence is not totally redundant and from Theorem 6. 2 we know 
that (N-[,U^) is not (D, 0)-l-diagnosable for any combinational 
detector D. 

Now construct a new network N'^ from Nj by adding a new 
component as shown in Fig. 6. 5. 

N’j = (I, R, (M r M 2 , M 3 ), (K r K 2 , K 3 )) 

I, R, M 2> Kj and Kg are identical to those 

of network of Fiff. 6. 2. 

K 3 - {1} 

m 3 : 


Fig. 6. 5. Network N^ 

Network Nj realizes machine M'j of this example under 
(e,e,n 3 , np where and where is given by the table: 


> \*3 

0 

1 

R 





S 1 

S 1 

s 2 

r 

s 2 

s 2 

S 1 
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q 

n;<q) 

a 

(pj» Sj) 

b 

(p r q 2> s 2 ) 

c 

^2*^2’ s 2^ 

d 

(p 2 »q 1 » s 2 ) 


For network N^' 


“ 7r {2, 3} ~ { (Pj> ®j)» (P 2 } q ®^)> (pj> q Sg), 


(pj> q 2 > ® 1^’ G?2’ ^2’ s l^’ ^*1’ ^2’ s 2^’ ^2’ ^2’ ®2^ 

and C x = { {a},{d}, {c},{b} } . Thus C x is a singleton cover and 
component is redundant. Similarly one can show that Mg and 
Mg are redundant. Hence is totally redundant, and (N’£, U^) is 
(D, 0)-l-diagnosable for some combination of detector D. 
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6. 4 Construction of Comblnationally Diagnosable Networks 

In Example 6. 4 we showed that a totally redundant network could 
be constructed from network through the addition of one compon- 
ent machine. In this section we will show that this can be done for 
any network. In addition, we derive upper and lower bounds on the 
minimum number of states that such an additional component must 
have. 


Theorem 6. 3: Let N be a state network of machines. Let m. = J Q. j , 

and let m = max m. . A network N’ where N f realizes N and 
l=iin 1 

(N’,U^-,) is (D, 0) -2 -diagnosable for some combinational detector D 
can be constructed from N by the addition of an m state component. 

Proof : Without loss of generality take Q. * {0, . . . , im}. Let 

N = (I,R, (M r ...,M n ), (K r ...,K n )) and let N' = (I, R, (Mj,...,M n , 

M n+1 ), (K 1 ,... ,K n ,K n+1 )) where K n+1 = {Qj, . . . ,Q , i} and where 

M n+ j is constructed such that for all q = (q^, . . . , q n+ ^) e P* , the 

n+1 

reachable part of N\ q^ = 0 (mod m). A machine M n+1 with 
m states which satisfies the above property is described below: 


M 


n+1 


^n+r^n+1’ ^n+1’ ^’^n+l^ 


where 




*ii v 1 

= {0, . . . , m-l} 
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P n+1 (r) = - ^ Pj(r) (mod m) for all r € R 


6 n+l- q n+l’ * q l’ * ‘ ‘ » V " q l (mod m) for all 

q. £ Q., 1 < i < n + 1, and all a e I where 
(q . • • i q n ) = 5((q^, . . . , q^), &)• 

It is clear that N* realizes N. Therefore, it remains only to 
be shown that (N',Uq) is (D, 0)-2-diagnosable for some combinational 


Let D be the combinational machine which realizes the function 

n+1 ^ r , 

X D‘ i=l where 


x D (q r • • • >q n+1 ) 


n+1 

0 if Z <1,* 
• 1 1 


1 otherwise 


0 (mod m) 


11+1 

Since (q^, . . . >q n+ j) e P* implies .2^ q. H 0 (mod m) no false alarms 


will occur. 


Let (r,x,y) be a minimal 2 -error caused by f € U-,. Since 
(r,x,y) is a minimal error and f only affects one component of N, 
/3 (x) and $ (x) will differ in exactly one coordinate. Say ti (x) = 

TV x 

• • • > 3.nd (3^ vx) - (Qj» • • • t • • • > Now (q • . • , 

n+1 . . 

6 P implies 2 q. = 0 (mod m). Since q. / q! and Q. < m, 

i=l i i i l — 
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q^ f qj (mod m). Therefore q^ + . . . + q| + . . . +q n+ j ¥ 0 (mod m). 
Hence, the error (r,x,y) is detected without delay, and (N',U^) is 
(D, 0)-2-diagnosable. 

In the proof of Theorem 6. 3 we have given a construction wirch 

can be used to form a totally redundant network from any network 

of machines. This construction simply involves the addition of one 

component to N. This theorem also gives an upper bound on the 

amount of additional redundancy required to make a given network 

totally redundant. This upper bound is stated in terms of the size of 

the state set of the additional component. 

The detector used in the proof of Theorem 6. 3 simply checked 

to see if the states of the components always summed to 0 (mod m). 

By using a more complex detector, namily one which can determine 

if the present state is in the reachable part, the number of states 

which the additional component must have can be reduced. 

Let m! be the number of states that M^, 1 < i < n, can actually 

enter while M. is a component of network N, and let m' = max m!. 

1 l<i<n 1 

That is, let m' = max |P.(P)|, where P.(P) is the projection onto 
l<i<n 1 1 

coordinate i of the reachable part of N. Then m* < m because P^(P) 
c Q., 1 < i < n, and Theorem 6. 3 holds with m replaced by m\ 
This claim is established in the following theorem. 
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Theorem 6. 4 : Let N be a state network of machines. Let 

m! = |P.(P)|, and let m* = max m{. A network N* can be con- 
1 1 l<i<n 1 

structed from N by the addition of an m’ state component such that 
N* realizes N and (N’,U^) is (D, 0)-2-diagnosable. 

Proof : Without loss of generality take P.(P) = {0, . . . , m|} and 

Q. = {0, . . . , rm} . Construct N' by adding component where 
N* and are exactly as in the proof of Theorem 6. 3 except for 

m being replaced by m\ 

We will show that (N’,U^) is (D, 0)-2-diagnosable by showing 
that Kg = 0 for all i, 1 < i < n, and then appealing to Corollary 
6. 2. 1. 

Assume, to the contrary, that 7/0 for some i, say for i = 1. 

S 

B. n p 

This implies the existence of two states q = (q^, qg, .... q n ) and 
q’ = (qj > q 2 > • • • » q n ) such that q, q* e P’ and q^ / q’^. Now q, q* e P* 
implies q^ + qg +. . . + q n H 0 (mod m’) and q^ + qg + . . . + q n s 0 
(mod m’). Hence, q^ 2 q^ (mod m 1 ) and since 0 < q^, q^ < m f , 

^1 = q r Contradiction. Therefore 7^=0 for all i, 1 < i <n, 
and the result follows immediately from Corollary 6. 2. 1. 

A technique similar to the one used in the proof of Theorem 6. 3 
could be used for the diagnosis of n Mealy machines which operate 
In parallel with the same inputs and resets. In this case one 


> 1 . 


Let n, 


^ » {Bj, . . . , B^}. Then for some j, 1 < j < 
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additional Mealy machine would be required which had as many out- 
put symbols as the machine with the largest output alphabet. There 
is no guaranty, however, that this technique will result in a savings 
over duplication for the additional machine may need as many states 
as the product of the number of states of the original n machines. 

We have shown that given a network N, a totally redundant 
network N* can be constructed thru the addition of a component with 
no more than m' states where m* = max )Pj(P) |. This amount of 
additional redundancy is not always necessary for N may already 
be totally redundant. The following example shows that this amount 
of additional redundancy is not necessary even if no component of 
the network is redundant. 

Example 6. 5: Consider state network Ng of Fig. 6. 6. 
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N 2 - (I,B,(M r M 2 ),(K r K 2 )) 
I - {0,1, 2, 3,4}, B = {r} 
(K r K 2 ) = ({!},{!}) 


£9 

0 

1 

2 

3 

D 

P 

Pi 

P2 

Pi 

P3 

p 3 

p 2 

■ 

h 

Pi 

P2 

P4 

P4 

Pi 

i 

h 

P3 

P4 

P2 

Pi 

P4 

■ 

P4 

P4 

P3 

Pi 

P2 

P3 

■ 



Fig. 6. 6. Network Ng 

Ng realizes state machine Mg of Fig. 6. 7 under (e,e, where 

r?g = and where is given by the following table: 






















135 




















136 


Since |Qg| = 8 and |Q^ x Qg | = 16 it should be clear that while Ng 
is not totally redundant there is some redundancy in this network 
realization of Mg. Thus if we were to add a component Mg to Ng 
in an attempt to form a totally redundant network Ng we should not 
be too surprised if we succeeded with a component Mg with fewer 
than m' states, where for network Ng m* = 4. In fact, if the 2 -state 
machine Mg = (Q^ x Qg x I, {s^,Sg}, 6g) where added to Ng where 
6g is such that Mg is in s^ whenever M^ and Mg are in (p^,qj), 

(pg.qg), (Pg.qg) or ^4’ Q4) and in s 2 whenever and M 2 are in 

(Pj.qg), (p2» p A - 'P3 ,q 4^ or ^ p 4 ,q 3^ then the network N *j so f° rme d 
would be totally redundant. 

An intuitively satisfying means to verify this claim is as follows. 
Component M^ computes the information Cj^j about the correspond- 
ing state of M. In this case the Cr.i are the following partitions of 



C{ — { 3-, d, b, c , e, h, f,g } 

C^gj = { aTb; cTd; eTT; gTh } 

C^gj. = { a, c,e, g; b,d,f,h } 

Since C {1} • Cjgj = C { g } • C { g } = C {l} • C { g } = 0 any two 
components taken together provide total information as to the corres- 
ponding state of Qg. Hence the remaining one will always be 


redundant. 
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The following result gives a lower bound on the number of states 
that an additional component must have in order for the resulting 
augmented network to be totally redundant. If the network under 
consideration is already totally redundant then the lower bound given 
by this result is one. Since the behavior of a state machine with one 
state is always a constant function, the actual addition of such a com- 
ponent is unnecessary. 


Theorem 6. 5: Let N be an n component state network and let N’ 

be the state network formed from N by the addition of a component 

with £ states. If N* is totally redundant then £ > max #|c. {. 

l<i<n 1 

Proof: Without loss of generality take #jc, j = max #|C. J , and 

l<i<n 1 

letd=#jc^|. Then for some q = (q^, ,q n )tQ, Ji?^*(f_ (q) n p) J 

= d. That is, if it is known that M 2 is in q 2 » that is in q^, and 
so forth up to M n being in q^ then there is still a d state uncertainty 
as to which state of M the state of M currently corresponds. It is 
necessary for M r+ j to have at least d states to resolve this 
uncertainty. 


The above result provides a good lower bound on the amount 
of additional redundancy required to form a totally redundant network, 
and it does so by taking into Account the redundancy which already 
exists in the network. This level of redundancy, however, is not 
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always sufficient because it may be impossible to find a component 
with d states which will simultaneously resolve the uncertainties 
represented by C j, C ^, .... and C^. The following describes just 
such a situation. 


Example 6. 6: Consider the state network of Fig. 6. 8. 

n 3 = (i,r,(m 1 ,m 2 ,m 3 ),(k 1 ,k 2 ,k 3 )) 

I = {o,l,2}, R = {r} 

= ({i},{i},{q i ,Q 2 ,i}) 

M 7 : 


V 
Q 1 \ 

0 

1 

2 

R 

q l 

q 2 

q l 

q 2 

r 

q 2 

q 2 

q i 

q l 





Fig. 6. 8. Network 
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This network realizes machine of Fig. 6. 9. 



0 

1 

2 

R 

a 

e 

b 

f 

r 

b 

g 

c 

h 


c 

g 

c 

g 


d 

h 

d 

h 


e 

e 

b 

a 


f 

f 

b 

a 


g 

g 

c 

b 


h 

h 

d 

b 



Fig. 6. 9. Machine 


For Ng realizing we have 

Cj = { {a,c},{b,d},{e,g},{f,h} } 

C 2 = { {a,e},{b,h},{c},{d},{f},{g} } 

C 3 = { {a},{b},{c,d},{e,f},{g, h} } 

Therefore m = max |Q. | = 3 and d = max #|c.|=2. 

1< i<3 1 l<i<3 1 

Suppose that it is desired to add a component to in order 
to form a totally redundant network. Theorem 6. 5 tells us that 
must have at least 2 states, and Theorem 6. 3 tells us that there is 
a 3 -state component which will work. We will show that in this case 
it is not sufficient for to have 2 states. 
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Let be a 2 -state component which when added to Ng forms 
Ng. Let = {b^, Bg}. Since is a cover of Qg, B^ U Bg = 
Qg. If iBjl > 5 or { Bg | >5 then would not be a singleton cover 

because Mg and Mg have only 2 states each and together they could 
not resolve a 5-state uncertainty. Therefore if Ng is to be 
totally redundant we must have jB^ |, [Bg } < 4 and thus will 
be a partition of Qg. 

For Ng to be totally redundant must resolve the following 
pairs of states: {a, e}, {b, d}, (e, g}, {f, h}, {b, h} ; {c,d},{e,f}, and 
{g, h}. It can resolve a pair only if the pair is split between 
and Bg. But this is easily seen to be impossible. Therefore 
there is no 2 -state component which when added to Ng will form 
a totally redundant network. 



CHAPTER Vn 


Conclusion 

In this report a fresh look at on-line diagnosis was taken 
from a system theoretic point of view. The approach used in this 
investigation was system theoretic in the sense that resettable dis- 
crete-time systems were used as a basis for a well-developed 
formal model of on-line diagnosis, and formal methods were used 
to investigate this model. As evidenced by the results in Chapters 
III through VI this approach has proved to be very fruitful. One 
advantage of this approach is that the results developed in this 
report are independent of any particular technology and may be 
applied to any system which can be modeled as a resettable machine. 

In Chapter II a complete model for the study of on-line 
diagnosis was developed, and a number of fundamental questions 
concerning on-line diagnosis were stated. Subsequent chapters 
provided some answers to these questions for the unrestricted fault 
case and the unrestricted component fault case. However, much 
more work remains to be done which could be carried out 
along the lines presented below. 

Except for some of the examples and for the networks considered 
in Chapter VI we have been dealing with abstract (i. e. , totally 
unstructured) systems. Such an approach is good for developing 
formally the concepts involved in our theory and for studying the 


141 



142 


diagnosis of unrestricted faults, but some of the questions raised 
can best be studied in a more structured environment. One reason 
for this is that with a structured system we can consider the causes 
of faults. For example, given an abstract system it makes no sense 
to speak of the set of faults caused by component failures of a cer- 
tain type or by bridging failures. However, given a structured 
representation of a system (e. g. , a circuit diagram) we can discuss 
these and other types of failures (causes) and determine the result- 
ing faults (eifects). 

There are many different structural levels that could prove 
useful to a further investigation into the theory of on-line diagnosis. 
Two levels which we believe will be important are: the binary state - 
assigned level and the logical circuit level. These levels and the 
basis for their potential usefulness are explained below. 

A machine M is said to be binary state -assigned if Q = { 0, 1} n 
for some positive integer n. Given such a machine we can speak 
of stuck -at -0 and stuck-at-1 and any other type of memory failure. 
The faults corresponding to these failures can be enumerated and 
comparisons can be made between various schemes for diagnosing 
these faults. Memory faults have been studied before in other con- 
texts and they are an important class of faults for a number of 
reasons. As we have seen, only a limited amount of structure is 
needed to discuss them. Thus memory faults can be analyzed 
before the circuit design of the machine is complete. Also, it is 
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memory which distinguishes truly sequential systems from purely 
combinational (one -state) systems. Combinational systems are 
inherently easier than sequential systems to analyze and a number 
of techniques for the on-line diagnosis of such systems are known 
(see[21j and [33] for example). 

A system possesses structure at the logical circuit level if 
a representation of the system is given in terms of a logical circuit 
composed of primitive logical elements. These may be of the AND- 
OR variety, threshold elements, or any similar elements of a "build- 
ing block" nature depending upon the technology being considered. 

This level is useful for investigating failures in the primitive 
components. The circuit in Fig. 2. 2 is an example of a structural 
representation at this level and the failure of this circuit discussed 
in Example 2. 2 is a simple example of the analysis that can be 
conducted at this level. 

Further work could also be performed at the network level 
of structural detail which was introduced in Chapter VI. At this 
level one could study the problem of implementing on-line diagnosis 
on a whole computer whereas with the other levels the emphasis 
would be on diagnosing one module. Note that in our definition of 
diagnosis the detector is not constrained to give simply a yes -no 
response. It could also provide extra information for use in 
automatic fault location. Thus, at this level, the problem of which 
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subsystems must be explicitly observed by the detector to achieve 
some desired fault location property could be studied. 

One problem that requires extension of our present model 
(at any structural level) is the problem of automatic reconfiguration 
of the system under the control of the detector. To study this 
problem, the model used would have to allow for feedback from the 
detector to the system it is observing. The question of how such 
an extension should be made is an interesting one and, if answered 
satisfactorily, could serve as a basis for a systematic investigation 
of reconfiguration techniques. 
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APPENDIX 


Resettable Machine Theory 

Our goal in the appendix is not to study the theory of resettable 
machines per se but rather to cover that part of it which is used in 
this study of on-line diagnosis. The theory of resettable machines 
follows closely the theory of sequential machines. The main 
differences in the d*:'initions stem from the presupposition that a 
resettable machine is reset before every use. One consequence of 
this is that the "unreachable" states of a resettable machine are 
always ignored. 

We begin by repeating here the basic machine notions introduced 
in Chapter II. 

Let M be a resettable machine. The reachable part of M, 
denoted by P, is the set 

P = {6(p(r),x)|r € R, x € I*} . 

M is reachable if P = Q. M is £ -reachable if 

P = {6(p(r),x) |r e R, x e I* and |x j < £} . 

Let M, M f e 311(1, Z, R). M is equivalent to M' (written M = M') 
if fl = for all r e R. Two states q e Q and q* e Q' are 
equivalent (q H q*) if /3 - /3' It is easily verified that these are 
both equivalence relations, the first on 311(1, Z, R) and the second on 
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the states of machines in 911(1, Z,R). M is reduced if for all 
q, q’ € P, q s q’ implies q = q\ 

If M and M’ are two resettable machines then M realizes M* if 
there is a triple of functions (or^Og,^) where a j_ : ft' ) + — > I + is a 
semigroup homomorphism such that cr^G') C I, R' R, 
a 3 : Z M Z’ where Z M c Z, such that for all r f € R' p , = 

a 3 * <V> * V 

The following result is analogous to the result due to Leake [23] 
which was cited in Section 2. 2. It supplies us with an alternative, 
and structurally oriented, definition of realization. 

Theorem A. 1 : Let M and M* be two resettable machines with reach- 
able parts P and P\ M realizes M T if and only if there exists a 
4 -tuple of functions (n^, ng, n 4 ) where 

fly V -> I 

n 2 : R' — > R 
TJg : Z Z ? 

n 4 : P’ -^^(P) - <t> (^(P) ={x|xcp}) 

such that 

i) 0(n 4 (p0>i?j(a)) ^ n 4 (6'(p’,a)) for all p’ e P’ and a € I T 

ii) Hg(X(p, ^(a))) = X r (p’,a) for all p' € P\ a e V, and p e r? 4 (p’) 


iii) p(r? 2 (r’)) e n 4 (p’(r’)) for all r’ € R*. 
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Thus for each p' e p' there is a p f P such that 

0 p»(v) = a 3^ p ( a 1 (v))) • 

Consider n 4 : P' -> ^(P) - <t> defined by 

n 4 (p’) = {p « P|^, = a 3 ° ° a ! } 

and consider I -> I defined by 

Hj(a) = Ojfe) . 

Claim: The 4-tuple ( r iy cr 2 ,a Z ,r1 ^ w ^ iere a 3 * s an arbitrary extension 
of Og to Z satisfies i), ii), and iii). 

i) Let p c n 4 (p*). We must show 6(p, rj^(a)) c n 4 (S’(p\ a)). 
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^6'(p’,a) 


(>:) = Pp.(xa) 

= a 3 (/3 p (a 1 (xa))) 

' °3(' 3 6(p, 0l (a)) ,t ’l (x))) 

■ °3 ( ^(p,P 1 (a)) (0 l (x))) ■ 


Hence. o(p, ^(a)) e r] 4 (6 f (p\ a)). 

ii) Let p c n 4 (p’). We must show 

tfgMp> ^(a))) = A'(p’,a). 

A'(p',a) = >3 pf (a) 

= OgOptn^a))) 

= a^Afon^a))) . 

iii) Let r f e R\ We must show p(fJ^{r')) e n 4 (p'( r')). 


/3;,(x) = o 3 o ff (r .)(“ 1 ^) ) ) 


implies 


p(a 2 ( r')) « n 4 (p’(r’)) . 
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(Sufficiency) Suppose there exists functions 0?^, i? 2 » 11 4^ as in the 

statement of the theorem. Let (I) + -> I + be the natural exten- 
sion of >7^ to sequences, 1'hat is, a^(a^. . . a n ) = n^(aj). . . hj(a n ). 
Claim: M realizes M' under (p Consider 5 : P* P 

where 

l (p') = some p e n^(p r ) such that 
p(n 2 (r')) = $ (p’(r’)) for all r* e R\ 

Let x = ya where ad. Then 

r, 3 (/3 T3 2 (r , ) (a l (x))) = T7 3 (/ p(n 2 ( r ’)) (<T l (x))) 

= (p’(r')) (a l (x))) 

= n 3 (A(5(? (p , (r , )),(T 1 (y)) > a 1 (a))) 

= n 3 (A(p,o 1 (a))) where p c n 4 (6'(p’( r%y)) 

= A f (fi*(p r ( r f ), y), a) 

" ^-(r')(y a) 

- ^.w 

This completes the proof of Theorem A. 1. 
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Theorem A. 2 : If M realizes M* and M* is reduced and reachable then 

l<al > IQ' I- 


Proof : Assume that M realizes M* under (apOgfOg) and that M* 

is reduced and reachable. Then /T = ff 3 0 P a ( r ) 0 for all r c R f * 

2 

Let q* • Q\ Then there exists r e R and x e (D* such that 
q' = 6»(p*(r),x). Now 


fyW ' '4l*<p-Cr).x) <sr) 


= ^(xy) 


* 0 3<' 3 a 2 (r) (a 1 (x !' ,,) 

= II 3 (,3 6(p(o 2 (r)),<J 1 W) (tr l <jr)>) 


Hence there exists a function f : Q' —> Q such that for each q’ e Q’, 


= °3 * %') ° a r 

To prove that f Q | > |Q T |, it suffices to show that f is 1-1. Let 

q^,q 2 6 an( * assume ^ at f(Qj) = f^)* Then 'q, a 3 ° ^( qi )° a l 

a„ o a. . o a = /3» . Since M’ is reduced and reachable this implies 
3 p f(q 2 ) 1 q 2 

that q^ = q 2> Hence f is 1-1. This establishes the result. 


Theorem A. 3 : The relation ’’realizes" is transitive. That is, M realizes M’ 

and M* realizes M” implies M realizes M". 
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Proof : (Sketch) Assume that M realizes M' under an£ * 

that M' realizes M” under or^). A ‘hen ^ « o^ 

2 

for all r’ e R' and /3^„ =030 &' a i ^ r ,,j * 0 ^ for all r" e R M . It follows 

2 

that /3”„ = a 3 0 a 3 0 0 a ( a * 0 °i * a i* That is > M realizes 

2 2 

M" under (Oj 0 or^, a 2 ° ct^, Og 0 a g). 


If M and M* are resettable machines then M is isomorphic to M’ 
if there exist four 1-1 and onto functions 


uij : I I* 

u >2 : R -> R* 
o> 3 : Z Z’ 
o> 4 : P->P’ 

such that for all r e R, a e I, and q e P 
0 w 4 ( 6 (q,a)) = 6’(co 4 (q), ujjCa)) 

ii) ctfg( A(q, a)) = A'(u? 4 (q), ojj (a)) 

iii) ^ 4 (p( r)) = p’(w 2 ( r » • 

The 4 -tuple Wg, a>g, o> 4 ) is called an isomorphism of M onto M\ 

If M, M* e 311(1, Z,R) and (e,e,e,u> 4 ) is an isomorphism of M onto M f , 
then M is strongly isomorphic to M\ A basic result of sequential 
machine theory states that for every machine there is an equivalent 
reduced machine and that this machine is unique up to strong 
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isomorphism. The corresponding result for resettable machines is 
given by Theorem A. 4 and Corollary A. 6. 1. 

Theorem A. 4 : For every resettable machine M there is a reduced 

and reachable machine M R equivalent to M. 

Proof : Let M = (I,Q, Z, S,A,R,p) and let M R = (I,Q R , Z, S R , X R , R,p R ) 
where 

Q r = {[q]|q € P} ([q] = {q* |q’ = q}) 

6 R ([q]> a) = [6(q, a)] 

^([q^a) = A(q, a) 
p R (r) = [p (r ) ] 

To prove this result we must verify (1) that 6 R and A R are well- 
defined, (2) that M r is reduced and reachable, and (3) that M 5 M R . 

The details of this proof are very similar to the details of the 
corresponding result in sequential machine theory. They may be 
found in many textbooks which cover this theory (e. g. , see Arbib 
[ 2 ]). 

M d as defined above is called the reduction of M. M* is a 

rv — 

reduced form of M if M* is reduced and M = M’. 

Lemma A, 5: M = M' implies ^ 6(p(r);X) = ^VW.x) f0r a " r £ R 


and x c I*. 
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Proof : Let a € I, x,y c I* and r e R. Then 
M = M' /3 r (xyz) = ^(xya) 

=> X(6(p(r),xy), a) = A.'(6 , (p’(r),xy), a) 

X(6(6(p(r), x), y), a) = X^S^S'^'O^x), y), a) 

^ ^6(p(r), x)^ a ^ - /i 6’(p , (r),x)^ ra ^ * 

Theorem A. 6 : If M and M' are both reduced and M = M f then M 

is strongly isomorphic to M\ 

Proof : Assume that M and M f are reduced and that M = M\ We 
know that each q e P is representable in the form 6(p(r),x). Define 
w 4 : P -> P* by 

u> 4 (6(p(r),x)) = 6’(p'(r),x) . 

Claim: M is strongly isomorphic to M' under (e,e,e,o> 4 ). We must 
show that a> 4 is well-defined, 1-1 and onto and that for all r e R, 
a € I and q e P 

i) w 4 (6(q,a)) = a*(cu 4 (q),a) 

ii) X(q, a) = X’(u> 4 (q),a) 


iii) w 4 (p(r)) = p’( r) . 



158 


In the following we denote (q) by q\ 

Well-defined : Let p = 5(p( r),x) and q = 5(p(s),y), and suppose that 

p-q. Then ' ! 6(p(r),x) = ' 3 5(p(s) ) y) aodthusby LemmaA ' 5 ’ |9 6'(p'(r),x) = 

0L t/ w . v. That is, p ' , = Since M* is reduced and p\ q’ € P’ it 
o' ip' (shy) p q 

follows that p’ = q\ Hence is well-defined. 

1-1 : Again let p = 5(p(r),x) and q = 6(p(s), y) but now suppose that 
p f q. Then by reapplying the above arguement p’ / q\ Hence, 
o>4 is 1-1. 

Onto : Since every q f e P f is representable in the form 6'(p T (r),x) 

CO4 is onto. 

That i), ii), and iii) are satisfied is straightforward to verify. 

Corollary A. 6. 1: The reduced form of M is unique up to strong 
isomorphism. That is, if M* and M’' are reduced forms of M then 
M* is strongly isomorphic to M". 

Proof : If M’ and M” are reduced forms of M then M a M’ and 

M - M M . Hence M’ = M". Since M f and M" are both reduced, by 
Theorem A. 6, M* is strongly isomorphic to M". 

Theorem A. 7 : If M = M' then M realizes M\ 

Proof : M = M* implies p = /3^ for all r e R. Hence M realizes M* 

under (e,e,e). 
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A resettable machine M is autonomous if Jlj = 1. 

Given a resettable machine M, two input symbols a, b € I are 
equivalent (a H b) if X(q, a) = A(q, b) and 6(q,a) = 5(q,b) for all q c P. 
M is transition distinct if no two of : ts input symbols are equivalent. 
Any machine which has equivalent inputs is redundant in the sense 
that the inputs in an equivalence class can bo represented by any one 
of its members without affecting the capabilities of the machine. The 
following results give an alternative characterization of equivalent 
inputs. 

Theorem A. 8 : Let M be a resettable machine, and let a,b e I. Then 
a = b if and only if for all x, y e I* and r e R>/3 r (xay) = /3 r (xby). 

Proof : (Necessity) Suppose a = b and assume, to the contrary, 
that /3 r (xay) £ /3 r (xby) for some r e R and x,y e I*. Let q = 6(p(r),x). 
Now, B (xay) / /3 (xby) implies (i (ay) / 0 (by). If y = A then 

i* 4 4 

A(q,a) £ A(q,b). If y e I* then 0^ ?a )^) ^ ^6(q,b)^ and hence 
6(q,a) r 6(q,b). Therefore a 4 b. Contradiction. Hence a s b 
implies 0 r (xay) = /3 r (xby) for all x,y e I* and r e R. 

(Sufficiency) Assume that a ^ b. Then for some q e P, A(q, a) / 
A(q,b) or 6(q, a) i 6(q,b). Let q = 5(p(r),x). Then A(6(p(r),x), a) 4 
A(6(p(r),x),b) or 6(p(r),xa) r <5(p(r),xb). Hence ^(xa) £ 0 r (xb) or 
for some y e I + , /3 r (xay) ^ j3 r (xby). Therefore if /3 r (xay) = 0 r (xby) 
for all r e R, and x, y e I* then a = b. 



